Agent Context Optimization

Updated 3 July 2026

Agent Context Optimization is a systematic approach that compresses, curates, and orchestrates context for LLM agents to maximize task success and efficiency.
ACON employs alternating stages of utility-maximization and cost-minimization to refine natural-language guidelines using LLM feedback and contrastive failures.
Empirical results demonstrate up to 54% token reduction and over 95% accuracy retention through efficient compressor distillation and optimized context management.

Agent Context Optimization (ACON) refers to the systematic and principled compression, curation, and orchestration of context for LLM agents operating in long-horizon, multi-step, and resource-constrained environments. The goal is to maximize task success and efficiency by dynamically managing the historical observations and intermediate outcomes supplied to agent policies, while minimizing unnecessary memory usage and computational overhead. Fundamentally, ACON shifts context management from passive or static heuristics to data-driven, optimizable procedures that leverage learning, reflection, and explicit trade-off control over evidence fidelity and cost (Kang et al., 1 Oct 2025).

1. Formal Structure and Objectives

ACON is formulated as an optimization problem over two components: the context compressions functions and a natural-language guideline (prompt) that specifies compression semantics. Within the Partially Observable Markov Decision Process (POMDP) formalism for agent tasks, the agent operates under a fixed policy π(a|h, o), where h is the agent’s full action–observation history and o the latest observation. Unconstrained, the context length grows linearly with the episode, incurring ballooning per-step and cumulative token/computational costs.

Let $f_{\rm hist}(h_t; \psi)$ and $f_{\rm obs}(o_t, h_{t-1}; \psi)$ denote compressor functions for history and observation, each parameterized by φ and guided by a natural language guideline $r$ , packed together as $\psi = (\phi, r)$ . The compressed context $\tau'(\psi)$ over $T$ steps is

$\tau'(\psi)=\{(h'_{t-1}, o'_t)\}_{t=1}^T$

with cost

$C(\tau'(\psi)) = \sum_{t=1}^T \mathrm{cost}(h'_{t-1}, o'_t).$

The agent’s overall objective, with agent policy fixed, is the Lagrangian:

$\max_\psi\ \mathbb{E}[R(\psi)] - \lambda\,\mathbb{E}[C(\tau'(\psi))]$

where $\lambda\ge 0$ controls the task-vs-cost trade-off. This structure is instantiated in frameworks such as ACON (Kang et al., 1 Oct 2025), PAACE (Yuksel, 18 Dec 2025), EvoDS (Yang et al., 2 Jun 2026), and ACE (Adaptive Context Elasticizer) (Liao et al., 30 Jun 2026).

2. Compression Guideline Optimization

Rather than reinforcement-learning (RL)–tuning compressor parameters, ACON explicitly optimizes the compression guideline $f_{\rm obs}(o_t, h_{t-1}; \psi)$ 0 in natural language via LLM reflection. The optimization is an alternating two-stage process:

Utility-Maximization: Runs the agent with and without compression, identifies where compressed context fails but the full context succeeds, and uses an LLM to analyze missing facts. Multiple candidate guideline updates are generated; the best (max task success) is selected.
Cost-Minimization: On tasks succeeded with the current guideline, asks the LLM to propose redundancy or spanning reductions, generating further candidate refinements; selects by success minus cost.

These stages are alternated for a fixed number of rounds or until convergence. The process leverages contrastive failures, data-driven feedback, and selection on held-out evaluation tasks, ensuring that compression is both effective and tailored to the agent’s true information needs (Kang et al., 1 Oct 2025). The optimization can be extended to multi-agent settings, such as in MEMO (Xie et al., 9 Mar 2026) and PAACE (Yuksel, 18 Dec 2025), which incorporate future-plan-aware relevance scoring.

3. Compressor Distillation and Practical Realization

A crucial component for real-world deployment is the distillation of the optimal LLM compressor (often expensive) into a smaller, efficient student model:

The teacher compressor is used to generate compressed targets (pairs of original history/observation and their compressed forms).
The student (e.g., Qwen3-14B, Phi-4, Qwen3-8B) is trained (often via LoRA) with a sequence-level cross-entropy objective to match these outputs.
At inference, the compact student compressor nearly retains teacher performance but with negligible compute overhead.

Empirically, student models preserve over 95% of the teacher’s accuracy and enable overhead-free application of optimized compression for long-horizon agents (Kang et al., 1 Oct 2025). Similar distillation is a key part of PAACE-FT (Yuksel, 18 Dec 2025), which achieves >10x inference cost reduction at 97% fidelity to teacher performance.

4. Empirical Benchmarks and Performance Analysis

ACON and related frameworks have been evaluated across a variety of long-horizon, multi-turn, and multi-agent benchmarks:

AppWorld (168 tasks, ≥15 steps):
- No compression: 56.0% accuracy, ~9.93K peak tokens.
- ACON (optimal+cost-aware): 56.5%, 7.33K (26–54% fewer tokens).
OfficeBench: ~30% reduction in peak context, >95% accuracy retained.
Multi-objective QA: ACON can exceed “no compression” baselines in EM/F1, while halving token use.
Compressor distillation: Student models preserve >95% teacher accuracy (Kang et al., 1 Oct 2025).

Results from PAACE show similar or greater gains, consistently lowering context size (peak and cumulative) while maintaining or improving correctness and F1 across domains (Yuksel, 18 Dec 2025). Ablations highlight critical trade-offs: too aggressive compression reduces accuracy, while high thresholds yield little cost reduction. The choice of optimizer LLM (e.g., contrastive OpenAI o3) also impacts final accuracy (a −3.6% drop with weaker or non-contrastive variants).

5. Methodological Extensions and Variants

ACON serves as a foundation for more intricate and scenario-specific approaches:

Plan-Awareness (PAACE): Compression leverages next-k-task relevance modeling, plan-structure DAG parsing, co-refinement of evolving instructions, and function-preserving compression (Yuksel, 18 Dec 2025).
Memory-Augmented Optimization (MEMO): Maintains persistent memory banks of extracted insights and injects them contextually using structured retention/exploration cycles, with uncertainty-aware selection and prioritized replay (Xie et al., 9 Mar 2026).
Plug-and-Play Elasticization (ACE): Separates storage (lossless raw and abstracted messages) from per-turn elastic orchestration, permitting reversible promotion/demotion of context granularity (raw/abstract/dropped) (Liao et al., 30 Jun 2026).
Meta-Optimization (Reflective Context Learning): Treats context space as a direct object of optimization, introducing batching, grouped rollouts, structured loss decomposition, failure replay, and optimizer state analogous to classical SGD (Vassilyev et al., 3 Apr 2026).

These variants generalize the ACON paradigm to settings such as multi-agent orchestration, on-device agents with stringent context/compute budgets (Vijayvargiya et al., 24 Sep 2025), and fine-grained control over evidence curation for enterprise analytics (Singh, 16 Jan 2026).

6. Design Considerations, Limitations, and Trade-offs

Key design insights emerge from extensive empirical and architectural investigations:

Compression thresholds: Aggressive compression risks relevant signal loss; loose thresholds yield limited savings.
History compression vs. observation compression: History compression often requires breaking the transformer KV-cache, adding latency; observation compression always yields net cost savings.
Compressor overhead: Distillation into small models eliminates the main overhead, enabling deployment at scale.
Failure modes: Naive truncation or summarization may irreversibly discard critical intermediate facts, such as API tokens or variables, impeding long-horizon tool use.
Optimization stability: Success depends on carefully balancing fidelity and reliability, often tuned agent-specifically (Yi et al., 29 May 2026).
Prompt engineering: Compression guidelines must be iteratively refined and are sensitive to LLM optimizer and feedback procedure in alternated optimization.

7. Broader Significance and Outlook

The emergence and standardization of Agent Context Optimization mark a transition from brittle, hand-tuned context management to systematic, learnable, and efficient adaptation of agent memory and prompt content. ACON and its extensions enable:

Robust performance for multi-step, tool-using agents operating over extended histories;
Generalization across task domains, including productivity, data analytics, planning, and coordination;
Seamless integration with multi-agent orchestration, memory banks, and plan-aware scheduling.

By systematizing the trade-offs between information retention, context cost, and agent compatibility, ACON provides a foundation for continual agent self-improvement and efficient use of LLM compute in real-world, long-horizon applications (Kang et al., 1 Oct 2025, Yuksel, 18 Dec 2025, Xie et al., 9 Mar 2026).