Comprehensive protection of multi-agent systems from untrusted content

Develop comprehensive, end-to-end defenses that protect multi-agent systems interacting with untrusted content, providing robust security across planning, delegation, inter-agent communication, and execution to prevent control-flow hijacking and related indirect prompt injection attacks.

Background

The paper demonstrates that current defenses based on alignment checks and least privilege are brittle or insufficient when multi-agent systems must autonomously adapt to errors and interact with untrusted data. The proposed CFG-and-rules defense improves security but is not claimed to be comprehensive.

In the conclusion, the authors explicitly state that achieving comprehensive protection for multi-agent systems against untrusted content remains an open research problem, emphasizing the need for more general solutions that reconcile safety with functionality in complex, delegated agent ecosystems.

References

How to comprehensively protect multi-agent systems from untrusted content remains an open research problem.

Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems  (2510.17276 - Jha et al., 20 Oct 2025) in Section 8 (Conclusion)