Agent Memory: What to Store, How to Compress, and Prevent Staleness

Characterize effective long-term memory design for LLM-based AI agents by specifying what categories of state to store (episodic, semantic, procedural), deriving compression and summarization methods that preserve critical constraints, and establishing safeguards that prevent stale or low-quality memory from dominating subsequent decisions.

Background

Long-horizon tasks require memory beyond the context window, but naïve designs risk contradictions, prompt injection persistence, or decision bias from outdated information. Deciding which memory types to maintain, how to compress them efficiently, and how to enforce provenance and quality is essential for reliability.

These design choices interact with budgets (tokens, retrieval fan-out) and safety. Establishing principled policies for memory writes and updates, along with verification and provenance tracking, is necessary to maintain consistency and robustness in real deployments.

References

Retrieval-augmented generation is a strong baseline, but open questions include what to store (episodic vs. semantic vs. procedural memory), how to compress and summarize without losing critical constraints, and how to prevent stale or low-quality memory from dominating decision making.

AI Agent Systems: Architectures, Applications, and Evaluation  (2601.01743 - Xu, 5 Jan 2026) in Section 7.2 (Long-Term Memory, Context Management, and Continual Improvement)