Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning (2506.07851v1)

Published 9 Jun 2025 in cs.CL

Abstract: LLMs have demonstrated significant improvements in contextual understanding. However, their ability to attend to truly critical information during long-context reasoning and generation still falls behind the pace. Specifically, our preliminary experiments reveal that certain distracting patterns can misdirect the model's attention during inference, and removing these patterns substantially improves reasoning accuracy and generation quality. We attribute this phenomenon to spurious correlations in the training data, which obstruct the model's capacity to infer authentic causal instruction-response relationships. This phenomenon may induce redundant reasoning processes, potentially resulting in significant inference overhead and, more critically, the generation of erroneous or suboptimal responses. To mitigate this, we introduce a two-stage framework called Learning to Focus (LeaF) leveraging intervention-based inference to disentangle confounding factors. In the first stage, LeaF employs gradient-based comparisons with an advanced teacher to automatically identify confounding tokens based on causal relationships in the training corpus. Then, in the second stage, it prunes these tokens during distillation to enact intervention, aligning the student's attention with the teacher's focus distribution on truly critical context tokens. Experimental results demonstrate that LeaF not only achieves an absolute improvement in various mathematical reasoning and code generation benchmarks but also effectively suppresses attention to confounding tokens during inference, yielding a more interpretable and reliable reasoning model.

Summary

The paper proposes the LeaF framework that enhances LLM reasoning by using gradient-guided analysis to prune distracting tokens.
It achieves over 20% improvement in mathematical reasoning and a 10% boost in coding tasks by realigning student attention with that of a teacher model.
The methodology establishes a causal approach to token pruning, which improves model interpretability and reliability in complex inference tasks.

Causal Attention Distillation: Enhancing Reasoning in LLMs

The paper "Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning" addresses a critical limitation in current LLMs: their susceptibility to distractive patterns and spurious correlations which impact reasoning accuracy and generation quality. The authors propose a novel two-stage framework, Learning to Focus (LeaF), to elevate the reasoning capabilities of LLMs by pruning confounding tokens that mislead the attention mechanism during inference.

Problem Statement and LeaF Framework

LLMs exhibit significant prowess in understanding and generating contextual content, yet they falter in long-context reasoning owing to distracting patterns embedded in the training data. Preliminary experiments indicate that LLMs can be misled by spurious correlations, disrupting their authentic causal inference abilities and inflating inference overhead. This paper hypothesizes that aligning the model’s attention with truly pivotal tokens can enhance reasoning performance and yield more reliable outputs.

To validate this hypothesis, the authors introduce LeaF, a pioneering framework for causal attention distillation. LeaF operates in two phases: First, it employs gradient-based analysis to discern confounding tokens by leveraging teacher-student comparisons. These tokens are deemed confounding if they are focal points of the student's attention but largely ignored by the teacher model, suggesting their role as spurious distractors. Second, these tokens are pruned during the distillation phase, guiding the student model’s attention to align more closely with the teacher model's attention on critical context tokens.

Numerical Results and Implications

Experimental evidence underscores the efficacy of LeaF, whereby LLMs trained under this framework demonstrate substantial improvements in reasoning tasks. Specifically, LeaF-secured models report a remarkable accuracy boost, exceeding 20% in mathematical reasoning tasks and over 10% in coding tasks as illustrated in the figures accompanying the paper. These results suggest a robust enhancement in reasoning consistency and inference reliability post-pruning of distracting patterns.

Moreover, the methodology detailed in the paper provides insights into modeling attention dynamics with a causal framework inspired by Pearl’s Structural Causal Model. By dissecting and nullifying the spurious correlations through focused token pruning, the paper presents an innovative mechanism for constructing counterfactual samples, thus enabling the student models to recognize genuine causal dependencies efficiently.

Theoretical and Practical Implications

The implications of this research are profound. Theoretically, it establishes a causal approach to attention distillation, propelling further exploration into nuanced token-level interventions in model training. Practically, the framework promises tangible enhancements in everyday applications requiring logical deductions and complex problem-solving, such as automated theorem proving and sophisticated code generation.

Speculative Future Directions

Considering the promising outcomes of LeaF, future research could dive deeper into adaptive models that self-prune confounding tokens without an advanced teacher guide, streamlining the functionality for practical deployment. Additionally, exploring broader applications in different domains such as healthcare data processing or scientific research could further demonstrate LeaF’s versatility and adaptability.

Conclusion

In conclusion, "Learning to Focus" introduces a compelling framework, LeaF, that rigorously enhances the reasoning capabilities of LLMs by prioritizing genuine causal attention over spurious distractions. This paper not only provides substantial experimental validation but also charts a promising trajectory in fine-tuning LLMs for improved interpretability and performance in complex reasoning tasks.