Unclear protection mechanisms for untrusted content in multi-agent systems
Ascertain whether large language model–based multi-agent systems that interact with untrusted inputs (such as malicious web content, files, email attachments, images, audio, or video) deploy any mechanisms to isolate and sandbox untrusted content and protect users, and characterize these mechanisms if they exist, including how the systems delineate the boundary between trusted and untrusted content.
References
Whereas Web browsers have developed sophisticated mechanisms, such as the same-origin policy, to isolate and sandbox untrusted content, the boundary between trusted and untrusted content in multi-agent systems is blurry, and it is not clear what mechanisms—if any—these systems are deploying to protect users from malicious content.
— Multi-Agent Systems Execute Arbitrary Malicious Code
(2503.12188 - Triedman et al., 15 Mar 2025) in Section 1 (Introduction)