Three independent research papers published to arXiv on June 1, 2026 converge on a shared concern: as AI agents are increasingly trusted to perform real-world tasks — from scheduling and file retrieval to malware analysis — they introduce new attack surfaces that existing security frameworks have not fully addressed.
Injection Depth Is the Dominant Risk Factor
In the most controlled of the three studies, researcher Mohammadreza Rashidi examined how the position of a malicious instruction within a tool-use sequence affects the likelihood of an AI agent following it — a metric known as attack success rate (ASR).
Testing GPT-4o-mini and Claude Haiku across 460 trials and 20 scenarios, Rashidi found that injection depth — how far into a tool-calling sequence a payload appears — is the single most important variable. Against GPT-4o-mini, ASR fell from 60% when a malicious instruction appeared first in the tool sequence to 0% by the fourth or fifth position. The decline was attributed both to the model resisting early injections and to the agent completing its task before encountering later payloads.
Claude Haiku performed markedly better, recording 0% ASR at every depth tested, which Rashidi attributed to conservative tool invocation habits and a stronger baseline resistance to instruction hijacking.
The study also found that the rhetorical style of an injected payload — its "framing" — significantly affects success rates, with persona-based framings achieving 75% ASR compared to 25% for neutral language. Turn budget, however, proved irrelevant: agents were equally vulnerable whether given three or seven turns to complete a task.
Malware Files as a New Injection Vector
A separate pair of papers from Brian Crawford, Justin Phillips, and Patrick McClure explored a distinct but related threat: prompt injection attacks embedded directly inside executable binary files targeted at AI-powered malware analysis tools.
Tools like Ghidra, when paired with large language model integrations such as GhidraMCP, allow malware analysts to automate the interpretation of decompiled code. The researchers demonstrated that adversaries can embed hidden instructions — using extraneous string variable assignments — inside binaries that pass commands to the underlying LLM without affecting the file's actual execution behaviour.
Using a genetic algorithm-based technique adapted from an existing adversarial method called AutoDAN, the team showed that such injections can cause AI analysis pipelines to misinterpret or misreport what a piece of malware actually does — potentially allowing malicious software to evade automated detection.
A companion paper from Crawford and McClure then investigated both detection methods and obfuscation techniques, finding that while defenders can identify prompt injection strings in decompiler output, attackers can in turn obfuscate those strings — prompting a cat-and-mouse dynamic that the authors say must be understood before such systems are deployed in production cybersecurity environments.
Implications for Deployed AI Systems
Taken together, the three papers underscore a structural tension in agentic AI design: the same tool-use loops that make these systems productive also make them susceptible to manipulation from any data source they ingest. Sanitising the first tool observation, Rashidi notes, would capture approximately 67% of measured injection successes — a meaningful but incomplete mitigation.
None of the papers propose fully solved defences, but all three emphasise that awareness of these attack patterns is a prerequisite for safe deployment.