LightThinker++: Efficient LLM Reasoning Framework
- LightThinker++ is an advanced reasoning compression and memory management framework that selectively archives, expands, or condenses intermediate results for efficient long-horizon inference.
- It employs explicit adaptive memory manipulation trained via behavioral supervision to maintain robust accuracy under tight context budgets.
- The architecture achieves up to 70% memory savings and balances detail retention with resource constraints, enhancing performance on agentic and systematic reasoning tasks.
LightThinker++ is an advanced reasoning compression and memory management framework for LLMs, designed to enable deep, efficient, and long-horizon inference while minimizing computational and memory overhead. Building on LightThinker’s gist-token approach (implicit compression), LightThinker++ introduces explicit adaptive memory manipulation trained via behavioral supervision, allowing LLMs to selectively archive, expand, or condense intermediate reasoning in a manner cognizant of both logical dependencies and resource constraints. This paradigm enables state-of-the-art memory savings (up to 70%), robust accuracy under tight context budgets, and superior performance on both traditional systematic reasoning and long-horizon agentic tasks (Zhu et al., 4 Apr 2026).
1. Motivation and Cognitive Foundations
The motivation for LightThinker++ arises from the cognitive-economy principle observed in human reasoning: only the most salient intermediate results are retained for ongoing deliberation, with details deferred until needed. In the LLM context, naïvely generating long chain-of-thought (CoT) traces results in context growth linear in the number of reasoning steps, causing transformer memory (key-value cache) to scale as and attention cost as . For complex tasks or extended interaction (e.g., agentic deployments, multi-step proofs), this growth is unsustainable and triggers failure modes including context truncation and degraded performance (Zhang et al., 21 Feb 2025, Zhu et al., 4 Apr 2026).
While prompt engineering and tokenwise pruning offer partial remedies, they are either heuristic or introduce high control latency. LightThinker++ is designed to let the LLM itself learn when and how to compress, archive, or recover intermediate results, supporting both short-term efficiency and long-range coherence (Zhu et al., 4 Apr 2026).
2. Architectural Principles and Memory Primitives
LightThinker++ generalizes the static gist-token approach of LightThinker by introducing dual-form representations and explicit memory primitives:
- Reasoning Entities: Each intermediate step is stored as , where is