GlobEnc: Global Token Attribution Framework
- GlobEnc is a comprehensive framework that integrates self-attention, residual connections, and layer normalization to compute global token attributions in Transformer encoders.
- It employs norm-based decomposition and recursive aggregation to produce attribution scores that closely align with gradient-based saliency metrics across various NLP tasks.
- Experimental results on models like BERT and ELECTRA demonstrate GlobEnc’s effectiveness in improving interpretability and enabling efficient prompt compression.
GlobEnc is a global token attribution framework for interpreting Transformer encoder models, designed to quantify the contribution of each input token to the model’s prediction by aggregating information across all encoder components and layers. By integrating self-attention, residual connections, and layer normalization effects within the encoder block and propagating these attributions through the entire stack, GlobEnc addresses critical shortcomings in prior methods that relied exclusively on attention patterns or local norm-based analysis. It provides highly correlated, faithful global token attributions—validated by strong alignment with gradient-based saliency scores—and is applicable across a range of NLP tasks for both model interpretability and prompt compression.
1. Framework Motivation and Scope
GlobEnc was developed in response to limitations observed in existing Transformer attribution methods, which typically focus solely on attention weights or restrict analysis to a narrow subset of encoder components. Classical norm-based or rollout approaches often omit residual connections and the full effect of layer normalization, leading to incomplete or sometimes misleading token importance scores. GlobEnc expands the attribution scope by encompassing nearly all operations within the encoder block (self-attention, both residual paths, and layer norm transformations), with the sole exception of direct nonlinear effects from feed-forward networks (FFNs) which are approximated.
The result is a holistic global token attribution method that yields much more accurate and meaningful salience maps. This is particularly pertinent given the hierarchical, compositional nature of Transformers, where critical information may be propagated and mixed non-trivially through successive layers and both linear and normalization operations.
2. Technical Methodology
GlobEnc employs a norm-based decomposition of encoder layers in the attribution analysis. For a given encoder layer, the output for token is expressed as: where are the self-attention weights (across heads), is the value transformation, and is the input (via the residual link).
This sum is subject to two layer-normalization steps and an FFN + residual path, such that the final representation for token becomes: Direct attribution through non-linear FFN is infeasible, but GlobEnc uses a normalization function derived from the layer norm: where and are the feature-wise mean and std, and is the learned scaling vector.
Token-wise attribution from to after the whole encoder block is then approximated by:
Global attribution across the full Transformer is obtained by recursive aggregation (similar to attention rollout), where the layer-level attribution matrices are combined: with each being a residual-adjusted attribution matrix for layer , computed from norm-based token contributions.
This process yields a cumulative global attribution score for each input token, reflecting its influence on the output (for example, the [CLS] token in classification).
3. Experimental Validation and Results
GlobEnc was validated across classification tasks (SST-2, MNLI, HateXplain) and various Transformer architectures (BERT-base, BERT-large, ELECTRA) (Modarressi et al., 2022). The primary evaluation used Spearman’s rank correlation between GlobEnc’s global attribution scores (for the [CLS] token) and gradient-based saliency values (gradient×input).
Quantitative results indicate that GlobEnc consistently outperforms earlier methods:
- SST-2: GlobEnc achieves a correlation of $0.77$, versus lower scores for prior attention/norm-only approaches.
- MNLI: $0.78$ correlation (GlobEnc), compared to lower baselines.
- HateXplain: $0.72$ correlation (GlobEnc), again leading other methods.
Qualitative analysis demonstrates that GlobEnc accurately highlights the most influential tokens, e.g., identifying “meaningful” in positive sentiment sentences or “waste” in negative cases. Early layers concentrate attribution on [CLS], while deeper layers shift to target-relevant content words, illustrating progressive contextualization.
4. Practical Applications and Implications
GlobEnc’s global attribution analysis has several concrete applications:
- Model Interpretability: Reveals which tokens drive transformer decisions, aiding in model debugging and trustworthy system deployment.
- Prompt Compression: GlobEnc underpins frameworks such as FrugalPrompt (Raiyan et al., 18 Oct 2025), where attribution scores are used to filter out low-utility tokens, reducing prompt length by up to with marginal loss in downstream performance for conventional NLP tasks.
- Analysis of Downstream Tasks: Enables inspection of token contributions in sentiment analysis, inference, and hate speech detection, offering insights into model biases and decision patterns.
- Research Foundation: Sets precedent for future interpretability that may extend to nonlinear FFN attribution or adapted to novel transformer architectures.
Case studies with prompt compression (Raiyan et al., 18 Oct 2025) further demonstrate that GlobEnc-extracted high-salience tokens are sufficient for tasks such as sentiment analysis, summarization, and QA, but not for mathematical reasoning (which requires exhaustive context).
5. Comparison to Prior Methods
GlobEnc advances beyond earlier attention-only or local norm-based attribution by:
- Aggregating contributions from all encoder components (attention, residuals, layer norms), rather than truncating at self-attention or ignoring contextual mixing.
- Providing more robust global attribution via recursive aggregation across layers.
- Achieving higher experimental correlations to gradient-based saliency, establishing higher faithfulness.
- Maintaining computational efficiency and model generalization across architectures.
In contrast, methods such as DecompX (used alongside GlobEnc in FrugalPrompt) leverage full vector decomposition, including nonlinear propagation through FFNs, at higher computational cost and with variable improvement depending on the task.
6. Code Availability and Implementation
The GlobEnc codebase is available at [https://github.com/mohsenfayyaz/GlobEnc], compatible with HuggingFace Transformers and standard deep learning frameworks (PyTorch). Users require a pre-trained transformer (e.g., BERT or ELECTRA), with instructions provided for dependency installation, experiment running, and generation of attribution maps.
This supports ready adoption for research, interpretability, and downstream analytical pipelines in NLP.
7. Significance and Limitations
GlobEnc represents a substantive advancement in Transformer attribution, offering more comprehensive and faithful token importance quantification than prior mechanisms. Its integration of multiple encoder components and recursive rollout produces attribution maps aligned with theoretical expectations and empirical gradient-based methods.
Limitations remain in direct attribution for nonlinear FFN components, which GlobEnc approximates but does not fully decompose. Additionally, like all attribution methods, GlobEnc depends on the assumptions underlying norm-based analysis and may require extension for specialized transformer architectures or highly structured input scenarios.
In summary, GlobEnc enables nuanced, reliable analysis of token contributions in large Transformer encoders, facilitates performance-aware prompt compression strategies, and bolsters the interpretability of transformer decisions across a range of NLP tasks (Modarressi et al., 2022, Raiyan et al., 18 Oct 2025).