Malware has historically operated on deterministic logic. A payload executes a predefined sequence—exploit a vulnerability, establish persistence, exfiltrate data—using code that was static from the moment of compilation. The attacker's intelligence was embedded before deployment; the malware itself was, in computational terms, unintelligent.
That assumption no longer holds. The MDPI AI survey on AI-driven cybersecurity, cross-referenced with Trend Micro's first-half 2025 threat intelligence report, documents the emergence of PROMPTFLUX and PROMPTSTEAL—the first malware families that invoke large language models at runtime. These are not AI-assisted attack tools where a human operator uses ChatGPT to write phishing emails. These are autonomous malicious programs that call LLM APIs during execution to adapt their behavior, generate context-appropriate social engineering content, and modify their attack strategies based on the target environment.
The distinction matters. A static phishing template can be fingerprinted and blocked. A malware instance that generates unique, contextually tailored communications on every execution—drawing on an LLM's ability to mimic writing styles, respond to security prompts, and construct plausible pretexts—presents a fundamentally different detection challenge.
The Research Landscape: AI in Cybersecurity, Both Sides
The dual-use nature of AI in security is not new in principle, but the scale of adoption in 2025 has accelerated on both offense and defense simultaneously.
Defensive adoption: AI cybersecurity tool adoption has increased substantially according to the surveyed literature, driven primarily by the need to process alert volumes that exceed human analyst capacity. Security operations centers (SOCs) now routinely deploy ML-based anomaly detection, natural language processing for log analysis, and automated incident triage. The economic logic is straightforward: the volume of network telemetry, endpoint events, and application logs generated by modern infrastructure exceeds what human teams can review, and attack dwell times punish slow detection.
Offensive evolution: On the adversarial side, LLMs have lowered the skill barrier for constructing sophisticated attacks. Pre-LLM social engineering required linguistic competence in the target's language, knowledge of organizational context, and the patience to craft individualized messages. LLMs commoditize all three capabilities. What PROMPTFLUX and PROMPTSTEAL add is the removal of the human operator from the loop entirely—the malware itself handles the adaptive communication.
Institutional awareness: a large majority of security leaders surveyed report that they are preparing for routine AI-powered attacks, suggesting that the threat is no longer hypothetical in the minds of practitioners, even if documented in-the-wild cases remain limited to these initial families.
Critical Analysis
<| Claim | Source Evidence | Verdict |
|---|---|---|
| First malware families using LLMs at runtime have been identified (PROMPTFLUX, PROMPTSTEAL) | Documented in Trend Micro 1H 2025 threat intelligence and corroborated in MDPI AI survey | ✅ Supported — novel finding with named malware families |
| AI cybersecurity tool adoption increased substantially | Reported in MDPI AI survey synthesis across multiple industry studies | ⚠️ Aggregated metric — methodology of aggregation across studies may vary |
| a large majority of security leaders preparing for routine AI-powered attacks | Survey data reported in the literature synthesis | ⚠️ Survey-dependent — sample composition and question framing affect interpretation |
| LLMs are used for both attack and defense in cybersecurity | Multiple documented use cases on both sides | ✅ Supported — well-established dual-use pattern |
What the Evidence Shows
PROMPTFLUX represents a qualitative shift rather than a quantitative escalation. The malware does not simply use AI to be "better" at what malware already did—it introduces a new capability class. By invoking LLMs at runtime, the malware can:
PROMPTSTEAL, the second documented family, focuses specifically on credential harvesting, using LLM-generated prompts to construct convincing authentication pages and session hijacking pretexts.
What Remains Uncertain
The prevalence of LLM-runtime malware in the wild is difficult to assess. The documented cases may represent the visible edge of a larger trend, or they may be proof-of-concept outliers that have not yet achieved widespread adoption in the criminal ecosystem. The substantial adoption growth for defensive AI tools, while specific, aggregates across heterogeneous studies with different measurement approaches, making direct comparison difficult.
The cost structure also matters. Calling commercial LLM APIs from malware creates a financial trail and depends on API access that providers can revoke. Whether attackers will shift to locally hosted open-weight models (eliminating the API dependency) or develop novel access methods remains an open question.
Open Questions
Detection methodology: How should security tools identify LLM API calls originating from malware as distinct from legitimate application usage? Distinguishing malicious from legitimate calls within the same enterprise environment requires behavioral context that current tools may not capture.
Open-weight model implications: If attackers migrate from API-based LLMs to locally hosted models (Llama, Mistral, and others), API-level detection strategies become irrelevant. The security community has not yet developed approaches for detecting locally hosted LLM inference within malware execution environments.
Escalation dynamics: Will defensive AI and offensive AI enter a co-evolutionary arms race? Historical precedent from other dual-use technologies (cryptography, network security) suggests co-evolution rather than decisive advantage.
Closing Reflection
The appearance of PROMPTFLUX and PROMPTSTEAL marks a transition point—malware that thinks, adapts, and communicates using the same language models that power enterprise productivity tools. The substantial increase in defensive AI adoption suggests that defenders are not standing still, but the fundamental asymmetry of security—where attackers need to find one weakness while defenders must protect all surfaces—is amplified when the attacker's tool can generate novel approaches at machine speed. The question is no longer whether AI will reshape cybersecurity, but whether defensive applications of the same technology can maintain pace with adversarial innovation.