Linear Semantic Attribution (LSA)
- LSA is a framework that approximates the effect of feature subsets via a linear least-squares model, offering clear and interpretable attributions.
- The method is grounded in axioms from game theory, such as symmetry and dummy player properties, ensuring fair and consistent feature scoring.
- LSA extends to quantify interactions and address issues like semantic leakage in NLP, vision-language models, and optimization problems.
Linear Semantic Attribution (LSA) designates a set of theoretically grounded frameworks for assigning additive or decomposed importance scores to input features so as to explain model predictions or semantic relationships. The unifying principle is a linear, typically least-squares, approximation of the effect of subsets (coalitions) of features or concepts on model output, sometimes linked to game-theoretic values, sensitivity analysis, or embedding geometry. LSA has become central to model interpretability in domains including natural language processing, vision-language modeling, and optimization.
1. Core Principles and Mathematical Formulation
Linear Semantic Attribution describes approaches that approximate the effect of sets of features, words, or concepts on a model output by a linear function, where the coefficients represent attributions. Formally, for input and model , one defines a characteristic function , where is the input with all but indices in ablated or replaced by a neutral baseline. The objective is to find attribution scores such that
for all in a specified family of coalitional sets, typically tree nodes or a combinatorial family related to data structure.
The least-squares solution is given by
which may be solved by the normal equations or pseudoinverse. In the LS-Tree framework, is the set of parse tree constituents for linguistic data (Chen et al., 2019).
The same principle is extended to other settings, such as:
- Vision-LLMs: Attribution maps are linear in the query embedding, leading to leakage and hallucination phenomena unless rectified (Bilgiç et al., 8 Jun 2026).
- Linear optimization problems: Attribution scores are sensitivities (gradients) or perturbation effects of LP parameters on optimal output (Steinmann et al., 2022).
2. Axiomatic Justification and Connections to Game Theory
The least-squares LSA solution is uniquely determined by axioms adapted from the Banzhaf value in cooperative game theory:
- Symmetry: Swapping two features with identical marginal effect yields equal attributions.
- Dummy (Null) player: Features that contribute additively and independently receive their singleton marginal value.
- Marginal-contribution invariance: Only first-order marginal effects matter.
- 2-Efficiency (Merge invariance): Merging two features yields attribution equal to the sum of individual attributions.
For parse tree–restricted families, the unique extension of the Banzhaf value satisfying these axioms coincides with the least-squares solution 0 (Chen et al., 2019). This situates LSA as a principled attribution framework constrained by fairness and consistency in feature scoring.
3. Extensions: Interaction Quantification and Orthogonal Projections
LSA generalizes beyond first-order scores:
- Interaction Scores: Using Cook's distance–like statistics, interactions between sibling nodes in a parse tree can be measured as the change in 1 when excluding a node and its ancestors. Formally,
2
quantifies the non-additivity or synergistic effect of constituents (Chen et al., 2019).
- Linear Semantic Leakage & Hallucination: In high-dimensional embedding space, linearity in concept embeddings causes shared components ("ghosts") to appear in attributions, explaining semantic hallucination. Formally, for query embedding 3, the attribution becomes a sum of true and ghost terms (Bilgiç et al., 8 Jun 2026).
- Orthogonal Semantic Projection (OSP): To eliminate ghost components, OSP orthogonalizes query embeddings against a dictionary of distractors via Orthogonal Matching Pursuit, purifying the attribution and ensuring that only unique semantic directions contribute. This transformation provably removes shared attribution (Bilgiç et al., 8 Jun 2026).
4. Empirical Applications and Illustrative Results
Applications span multiple domains:
- Natural LLM Interpretation: LS-Tree value attribution, applied to models such as BERT, CNN, and LSTM on sentiment (SST, IMDB, Yelp), shows substantial pattern divergence from bag-of-words baselines (Pearson correlations for BERT: 0.465, 0.321, 0.476), revealing deep nonlinearity, compositionality, and localizing higher-order interactions at parse tree nodes corresponding to adversative conjunctions ("but", "yet") (Chen et al., 2019).
- Vision-language Attribution: OSP demonstrates quantitative improvements in attribution faithfulness and reduction of hallucination in CLIP-based models: mIoU increases by ∼3–8 points, AUROC by 1–15 points. OSP heatmaps suppress spurious activations and align closely with true concept presence, as confirmed by large-scale user studies and perturbation-based metrics (Bilgiç et al., 8 Jun 2026).
- Explaining Linear Programs: LSA for LPs uses classical sensitivity derivatives (4, 5) and perturbation-based attributions. Evaluation on toy LPs, max-flow, and industrial energy models shows pathologies for plain gradient attributions (scale sensitivity, tie-breaking in multi-optima), with occlusion-based LSA revealing block-level driver parameters and robust explanations (Steinmann et al., 2022).
5. Algorithmic and Computational Aspects
Summary of primary computational steps:
| Domain | LSA Algorithmic Core | Complexity |
|---|---|---|
| NLP: parse tree | Parse, evaluate 6 per node, least-squares solve; DFS with Sherman–Morrison for interactions | 7 for inversion, 8 parsing |
| Vision-language | Linear saliency computation, dictionary OMP for OSP | 9 for OMP, batch matrix ops |
| Linear programs | Sensitivity analysis (one solve) or perturb-resolve per param/group | 0 per gradient (once basis known); 1 LP solves for occlusion |
Sparsity and block grouping are often required to keep computational cost tractable in high-dimensional or long-input scenarios.
6. Strengths, Limitations, and Practical Recommendations
Strengths of LSA include:
- Model-agnostic explanations: LSA methods provide local, model-agnostic feature attribution for any black-box or white-box predictor.
- Principled axiomatic characterization: Solutions are uniquely justified by game-theoretic fairness axioms.
- Interaction and nonlinearity diagnostics: Enable explicit measurement and localization of compositional effects and overfitting.
Limitations:
- Relies on structured input or embedding space geometry: E.g., parse tree errors and embedding leakage can propagate into attributions.
- Computation: Matrix inversion (2) is prohibitive for long sequences; OSP and perturbation methods may be costly at scale.
- Limited to low-order interactions: Higher-order, cross-branch, or non-additive effects are not fully captured in canonical LSA.
- Baseline/reference choice: Especially for perturbation attributions, the ablation baseline strongly influences results.
Practical guidance:
- Small/specialized corpora: Counter-based LSA (as in latent semantic analysis) outperforms predict-based embeddings for low-resource domains (Altszyler et al., 2016).
- Vision-LLMs: OSP intervention is recommended to mitigate linear leakage and ensure robust, faithful attributions (Bilgiç et al., 8 Jun 2026).
- Optimization: Leverage classical sensitivity for LPs when possible; use perturbation-based LSA in ambiguous or multi-optima cases (Steinmann et al., 2022).
7. Historical Context and Conceptual Distinctions
The term "Linear Semantic Attribution" appears in contemporary literature under various guises, often specific to community conventions:
- LSA originally designated Latent Semantic Analysis, a distributional semantic method involving SVD of the term-document matrix (Altszyler et al., 2016).
- LS-Tree Value: In model interpretability for NLP, LSA refers to word and phrase attributions computed via least-squares regression over parse trees, characterized axiomatically by restricted Banzhaf values (Chen et al., 2019).
- Attribution in high-dimensional embeddings: LSA provides the theoretical foundation for analyzing and correcting hallucination in explainable AI for vision-LLMs (Bilgiç et al., 8 Jun 2026).
- Optimization XAI: LSA is now used for input parameter attribution in linear/mixed-integer programming (Steinmann et al., 2022).
A plausible implication is that as attribution methods mature, the notion of "Linear Semantic Attribution" is being unified across domains under the paradigm of linear, axiomatically grounded decompositions—whether at the level of language, vision, or optimization parameters—each carefully adapted to the topology and semantics of the underlying domain.