Researchers Push Frontiers of Agentic AI With New Tools for Security, Speed and Optimisation

Three separate studies address key engineering and safety challenges as autonomous AI systems move toward large-scale deployment

edit
By LineZotpaper
Published
Read Time3 min
Sources3 outlets
A cluster of new research papers published this week outlines significant advances in agentic AI — autonomous systems that chain together multiple AI model calls and tool executions to complete complex tasks — addressing three of the field's most pressing challenges: security vulnerabilities, scheduling inefficiencies, and the difficulty of adapting to shifting real-world objectives.

As agentic AI systems move from research laboratories into production environments, engineers and security researchers are racing to build the infrastructure needed to make them reliable, fast, and safe. Three papers published this week on arXiv offer a snapshot of where that work now stands.

A New Benchmark for Red-Teaming Agentic Systems

Researchers from a team including Yarin Yerushalmi Levi, Roy Betser, and colleagues introduced RIFT-Bench, a methodology designed to stress-test the security of agentic AI systems at scale. Unlike earlier evaluation tools that were tied to specific platforms or use cases, RIFT-Bench uses a graph-based representation of agentic architectures to run standardised adversarial attacks across heterogeneous systems.

The framework operates in two automated phases: a Discovery phase that maps out a system's structure, and a Scanning phase that deploys adaptive adversarial probes and generates a security report. The researchers tested the approach across 45 distinct agentic systems, concluding that it generalises effectively across different implementations. RIFT-Bench can also directly evaluate the effectiveness of mitigation strategies, giving developers a way to measure whether their defences are working.

The paper notes that agentic systems introduce attack vectors that go beyond those of traditional large language models, given their capacity to take autonomous actions, access external tools, and operate across extended time horizons.

Cutting Latency in Multi-Agent Deployments

A separate team from institutions including the University of Edinburgh presented SwarmX, a scheduling system designed to reduce the tail latency that plagues large-scale agentic deployments. The core problem the researchers identified is that conventional schedulers — designed for more predictable workloads — struggle with agentic tasks because their execution time and structure depend heavily on the content of each prompt.

SwarmX addresses this by using neural predictors that incorporate prompt features, device state, and runtime conditions to make smarter routing and scaling decisions. In tests conducted on a 128-GPU testbed and validated against a near-thousand-GPU production environment, SwarmX reduced tail latency by up to 61.5 per cent and sustained up to twice the throughput of existing production schedulers under equivalent service-level constraints. The team evaluated the system on multi-agent code generation, deep research tasks, and multimodal workflows.

Adapting AI to Shifting Operator Policies

A third paper, from researchers at the Shanghai Jiao Tong University and the Singapore University of Technology and Design, tackled the challenge of deploying agentic AI in physical systems where the rules and objectives change over time. Their framework, Agentic-LTPO, uses a bilevel optimisation structure: a higher-level agentic layer interprets evolving operator policies and translates them into configuration parameters, while a lower-level solver handles real-time decisions.

The team demonstrated the approach on cell-free MIMO beamforming — a wireless network optimisation problem — where Agentic-LTPO improved long-term system performance by 57.2 per cent compared to traditional fixed-objective methods. The researchers argue the framework could apply broadly to physical-layer network management as telecommunications operators deal with increasingly dynamic service requirements.

Taken together, the three papers reflect a field grappling with the operational realities of deploying AI that acts, not just advises. The research community is beginning to produce the engineering foundations — security benchmarks, schedulers, and adaptive optimisers — that large-scale agentic deployment will require.

§

Analysis

Why This Matters

  • Agentic AI is moving rapidly from experimental to production deployment, and these papers address foundational gaps — security, performance, and adaptability — that could determine whether that transition succeeds or creates new risks.
  • The scale of infrastructure described (nearly 1,000 GPUs and 1 million CPU cores in the SwarmX deployment) signals that agentic systems are already operating at enterprise scale, making safety and efficiency research urgently practical.
  • Standardised security benchmarks like RIFT-Bench could become the basis for future regulatory or compliance frameworks governing autonomous AI systems.

Background

Agentic AI refers to systems in which large language models are given access to tools, memory, and the ability to take sequential actions — effectively allowing them to complete multi-step tasks with minimal human intervention. The architecture differs fundamentally from a simple chatbot: an agentic system might browse the web, write and execute code, send emails, and call APIs, all in service of a single user goal.

Interest in agentic systems accelerated sharply following the commercial success of LLMs like GPT-4 and Claude, with companies including OpenAI, Anthropic, Google DeepMind, and dozens of startups launching agentic products from 2023 onward. Early deployments revealed a new class of problems: these systems were harder to evaluate than static models, prone to unexpected behaviour when tool calls failed or chained in unanticipated ways, and difficult to secure because their attack surface extended far beyond the model itself.

The academic research community has responded with a growing body of work on agentic benchmarks, alignment, and infrastructure. The three papers published this week sit within that broader effort, each addressing a distinct engineering layer — security evaluation, runtime scheduling, and objective management — that will need to mature before agentic systems can be deployed reliably at societal scale.

Key Perspectives

AI Researchers and Developers: The authors of all three papers frame their work as providing missing infrastructure. The recurring theme is that existing tools — schedulers, security evaluators, optimisation frameworks — were designed for simpler, more static systems and do not transfer well to agentic architectures. Their solutions aim to be general-purpose rather than domain-specific.

Enterprise Operators: For organisations deploying or considering agentic AI, the SwarmX and RIFT-Bench results are directly relevant. A 61.5 per cent reduction in tail latency and a structured security evaluation methodology address two of the most common objections to production deployment: unpredictable performance and difficulty auditing system behaviour.

Critics and Safety Advocates: Security researchers have warned that agentic systems present qualitatively new risks compared to conventional AI tools, including the potential for goal misalignment across long action sequences, prompt injection via external tool outputs, and difficulty tracing harmful outcomes back through chains of automated decisions. While RIFT-Bench represents progress on evaluation, critics may note that the existence of a red-teaming tool does not itself make systems safer — it depends on whether developers act on the findings.

What to Watch

  • Whether RIFT-Bench or similar frameworks are adopted as industry-standard evaluation tools, particularly as governments in the EU and US develop AI auditing requirements.
  • The uptake of SwarmX-style scheduling in major cloud platforms (AWS, Azure, Google Cloud), which would signal that agentic workloads are becoming a first-class infrastructure concern.
  • How quickly the bilevel optimisation approach demonstrated in Agentic-LTPO is applied beyond wireless networks to other physical-layer domains such as power grids or autonomous vehicle coordination, where policy-driven adaptation is equally critical.

Sources

newspaper

Zotpaper

Articles published under the Zotpaper byline are synthesized from multiple source publications by our AI editor and reviewed by our editorial process. Each story combines reporting from credible outlets to give readers a balanced, comprehensive view.