Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents

Published 17 Apr 2026 in cs.AI, cs.CL, and cs.MA | (2604.15877v1)

Abstract: As LLM agents scale to long-horizon, multi-session deployments, efficiently managing accumulated experience becomes a critical bottleneck. Agent memory systems and agent skill discovery both address this challenge -- extracting reusable knowledge from interaction traces -- yet a citation analysis of 1,136 references across 22 primary papers reveals a cross-community citation rate below 1%. We propose the \emph{Experience Compression Spectrum}, a unifying framework that positions memory, skills, and rules as points along a single axis of increasing compression (5--20$\times$ for episodic memory, 50--500$\times$ for procedural skills, 1,000$\times$+ for declarative rules), directly reducing context consumption, retrieval latency, and compute overhead. Mapping 20+ systems onto this spectrum reveals that every system operates at a fixed, predetermined compression level -- none supports adaptive cross-level compression, a gap we term the \emph{missing diagonal}. We further show that specialization alone is insufficient -- both communities independently solve shared sub-problems without exchanging solutions -- that evaluation methods are tightly coupled to compression levels, that transferability increases with compression at the cost of specificity, and that knowledge lifecycle management remains largely neglected. We articulate open problems and design principles for scalable, full-spectrum agent learning systems.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper introduces a unified framework that compresses agent experience from raw traces to declarative rules, enabling scalable learning.
It quantifies trade-offs between specificity and transferability with empirical results showing up to 68.5pp performance gains in skill-based representations.
The work highlights open research areas, including adaptive compression control and lifecycle management of agent knowledge artifacts.

Experience Compression Spectrum: A Unified Framework for LLM Agent Knowledge

Introduction

The paper "Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents" (2604.15877) presents a formal conceptual framework to unify disparate approaches in agent memory, skill discovery, and rule extraction for LLM-based agents. The core thesis is that these traditionally isolated paradigms are best understood as points along a single spectrum of increasing experience compression. The authors argue that structuring knowledge—derived from agent-environment interaction—via varying levels of semantic compression can ameliorate critical scalability and efficiency issues arising from long-horizon, persistent deployments. The essay below reviews the technical content, implications, and open research directions stemming from this framework.

The Experience Compression Spectrum: Formalization and System Mapping

The spectrum is defined as four discrete levels of compression for artifacts derived from agent interaction traces:

Level 0 (Raw Trace): Uncompressed logs and trajectories with zero abstraction (1:1 compression).
Level 1 (Episodic Memory): Structured events, key-value extractions, and summaries—context-specific, but partially abstracted (compression 5–20×).
Level 2 (Procedural Skill): Higher-order patterns, workflows, and routines generalized across contexts (compression 50–500×).
Level 3 (Declarative Rule): Maximally compressed, context-invariant domain principles, policies, and constraints (compression 1000×+).

Each level presents distinct trade-offs in generalizability, specificity, and downstream utility. Notably, empirical results demonstrate that higher-compression representations yield stronger cross-domain transfer and operational efficiency; for instance, SkillRL reports a 68.5pp improvement using $L_2$ skills over $L_1$ experience in ALFWorld, and RuleShaping finds that negative $L_3$ constraints produce 7–14pp gains in coding agent benchmarks.

The authors map over 20 systems to this spectrum, revealing pronounced clustering at $L_1$ (e.g., Mem0, MemoryOS, SSGM) and $L_2$ (e.g., Voyager, EvoSkill, CASCADE), with scant activity at $L_3$ , and the absence of adaptive multi-level systems. ExpeL and AutoAgent straddle $L_1$ – $L_2$ but lack dynamic selection capability. No extant system performs online, adaptive promotion or demotion along the spectrum—a phenomenon the authors call the "missing diagonal."

Core Findings and Structural Insights

Several critical structural insights are exposed by viewing agent learning through the lens of the compression spectrum:

Sparse Cross-Community Exchange: Citation analysis (1,136 references, 22 key papers) shows <1% cross-community engagement between memory and skill research, exacerbating redundant solutions to common sub-problems (retrieval scaling, knowledge lifecycle management).
Evaluation Coupling to Compression Level: Memory systems are evaluated via QA/IR metrics, skill systems via task success, with no standard methodology for rules. This results in local optimization disconnected from global agent efficacy.
Transferability–Specificity Trade-off: Higher-compression artifacts (skills, rules) exhibit superior transfer properties but at the cost of actionable specificity. The optimal deployment architecture likely lies in leveraging all levels adaptively.
Neglected Lifecycle Management: Existing approaches predominantly address acquisition; systematic versioning, update, and conflict resolution strategies—common in software engineering—are not yet mainstream for agent knowledge artifacts.

These insights motivate architectural unification, with common modules supporting retrieval, conflict detection, and lifecycle management independent of compression level, and meta-controllers for adaptive selection.

Open Problems and Design Principles

The authors distill several outstanding research challenges and design desiderata:

1. Adaptive Compression Control:

Design meta-controllers capable of dynamically routing new experience to the optimal compression function $C_L$ , and supporting both upward (promotion) and downward (demotion) transitions as evidence warrants. This requires quantifying the value-of-information for knowledge at each level and integrating efficiency constraints in deployment.

2. Cross-level Consistency and Conflict Resolution:

Formal mechanisms must be developed for maintaining consistency between co-existing knowledge artifacts at disparate levels (e.g., demoting over-abstract rules in the presence of contradictory episodic evidence).

3. Principled Lifecycle Management:

Borrowing from SE, knowledge artifacts require versioning, provenance, validation status, and deprecation policies. Tracking these with rich metadata will enable safe, scalable, and interpretable knowledge evolution.

Further technical directions include reward-free compression, cross-domain skill transfer protocols, and multimodal extension to visual and embodied domains.

Design Principles:

Implement level-agnostic, parameterized compression cores.
Embed explicit bidirectional promotion/demotion for knowledge artifacts.
Integrate continuous governance pipelines for maintenance, validation, and consolidation, potentially exploiting idle-time computation.

Implications for Scalable Agentic AI

Practically, the Experience Compression Spectrum provides a rigorous framework for scaling LLM agents while minimizing compute and retrieval bottlenecks—a rapidly growing challenge as agents accrue vast interaction traces in production environments. The highlighted performance improvements from skills over memories (e.g., up to 68.5pp) strongly recommend transiting beyond monolithic, fixed-level memory systems. The absence of automated $L_3$ rule extraction, despite evidence of human practitioners doing manual rule distillation (e.g., CLAUDE.md, .cursorrules), marks a critical gap—adaptive, spectrum-spanning architectures will be necessary for agents to robustly generalize while maintaining specificity.

Theoretically, the framework provides testable predictions: (i) higher-compression levels should outperform lower-compression in cross-domain adaptation when data is held constant, (ii) compound, multi-level knowledge stores should show superadditive performance as deployment horizon increases, and (iii) transferability–specificity curves are non-linear, with optimal trade-off at intermediate levels.

Conclusion

The Experience Compression Spectrum formalizes the extraction of agent knowledge—episodic, procedural, and declarative—as points along a unified compression axis rather than disjoint paradigms. By mapping contemporary systems and aggregating empirical evidence, the authors expose critical architectural and research deficits, notably the lack of adaptive, multi-level systems (the "missing diagonal") and persistent neglect of lifecycle management. The framework generates concrete directions for building agents capable of efficient, robust, and scalable long-term operation through dynamic experience abstraction. Future progress in LLM agents will depend on bridging these gaps with generalizable, spectrum-aware architectures.

Markdown Report Issue