Create a Video View Paper

Neural Garbage Collection: Learning to Forget while Learning to Reason

This presentation explores how language models can learn to manage their own memory efficiently while solving complex reasoning tasks. The work demonstrates that memory management can be treated as a learnable capability rather than relying on hand-crafted rules, achieving dramatic improvements in efficiency without sacrificing accuracy. Through end-to-end reinforcement learning, models learn both what to remember and what to forget, opening new possibilities for resource-aware artificial intelligence.

Script

What happens when language models memorize everything during reasoning? They hit a wall. Every step of chain-of-thought reasoning bloats their memory cache, and existing solutions rely on rigid hand-crafted rules that can't adapt to the task at hand.

The authors reframe this bottleneck as a learning problem. Their system, Neural Garbage Collection, treats memory management as a sequence of decisions the model makes itself. At designated rounds, the model scores every entry in its cache using attention, samples which ones to keep via a technique called Gumbel top k, and prunes the rest before continuing.

Here's the breakthrough: both reasoning and forgetting are optimized together using policy gradient reinforcement learning from task reward alone. The model learns what to remember not from proxy objectives or supervision, but from whether its pruned memory lets it solve the problem correctly.

The evidence is decisive. On arithmetic reasoning, Neural Garbage Collection achieves 49.6 percent accuracy at 2.4 times cache reduction, more than doubling the next-best baseline. Across mathematical competition problems, it maintains robust performance at 3 to 5 times compression, consistently beating every heuristic method.

Ablation studies reveal why this works. End-to-end training and correct handling of eviction masks are essential. When the authors tried targeted dropout without accounting for off-policy effects, gradients exploded and reward collapsed. Neural Garbage Collection remained stable throughout, proving that efficiency must be learned the same way as capability.

This work proves that models can govern their own memory through learned policy, not prescribed rules. By treating resource allocation as a native action, it opens the door to truly adaptive computation where efficiency and reasoning co-evolve. To explore more research like this and create your own videos, visit EmergentMind.com.