Separating Legitimate Updates from Malicious Injections in Persistent Agent Files
Determine reliable criteria and mechanisms to separate legitimate updates from malicious injections in the persistent state files that enable evolution in OpenClaw (such as MEMORY.md, SOUL.md, IDENTITY.md, USER.md, and AGENTS.md), ensuring the agent can continue to learn and adapt without providing an attack surface for state poisoning.
References
A file-protection approach reduces injection rates by up to 97% but blocks legitimate agent updates at nearly the same rate, revealing a fundamental evolution--safety tradeoff: as long as the persistent files that enable evolution are also the attack surface, separating legitimate updates from injections remains an open problem.
— Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
(2604.04759 - Wang et al., 6 Apr 2026) in Abstract