Score-Based Attribution Analysis Overview

Updated 12 November 2025

Score-based attribution analysis is a framework that assigns continuous, scalar scores to individual features to quantify their influence on machine learning models.
It employs methods such as Shapley values, integrated gradients, and Taylor expansions to capture both first-order effects and higher-order feature interactions.
The approach is applied across NLP, computer vision, data management, anomaly detection, and generative modeling to enhance model interpretability and diagnostic auditing.

Score-based attribution analysis refers to the family of methods, frameworks, and evaluation protocols that assign scalar importance scores to individual features (e.g., input tokens, pixels, training examples, database tuples) to quantify their influence on a model’s prediction, intermediate behavior, or data-driven output. These methods are widely used across machine learning, natural language processing, computer vision, data management, anomaly detection, and generative modeling. At their core, score-based attribution approaches provide a continuous, per-feature measure—rather than a binary or categorical assignment—enabling nuanced interpretation, diagnostic auditing, and explanation of complex models.

1. Mathematical Foundations of Score-Based Attribution

Attribution scores are typically defined via one of three mathematical paradigms: cooperative game-theoretic values (e.g., Shapley, Banzhaf, Möbius scores), gradient/integration-based path methods, or Taylor/interactions-based distributions of higher-order effects.

Game-theoretic approach: The Shapley value computes, for each feature $i$ , the average marginal contribution across all possible feature coalitions. For input $x$ ,

$\phi_i(f, x) = \sum_{S \subseteq N \setminus \{i\}} \frac{|S|!(|N|-|S|-1)!}{|N|!}[f_{S \cup \{i\}}(x) - f_S(x)].$

This can be extended to interaction indices, Owen values for grouped features, and fractional allocations via the Weighted Möbius framework, which encompasses a broad class of attribution methods through linear combinations of multi-feature interactions (Jiang et al., 2023).

Gradient/integration-based approach: Integrated Gradients (IG) measures feature importance as the path integral from a baseline $x'$ to the input $x$ ,

$\mathrm{IG}_i(x) = (x_i - x'_i) \int_0^1 \frac{\partial f(x' + \alpha(x - x'))}{\partial x_i} d\alpha.$

This is interpreted as averaging first-order Taylor terms along a straight-line path and has been demonstrated to be closely linked to game-theoretic weights via limiting arguments (Deng et al., 2021, Jiang et al., 2023).

Taylor expansion/interactions: Any score-based attribution can be represented as a reallocation of Taylor expansion terms of $f$ at a baseline $a$ , distributing higher-order independent and interaction effects among features. For example, the Shapley value divides each order- $k$ interaction of $S$ among its members with weight $1/|S|$, preserving the completeness and symmetry axioms (Deng et al., 2023, Deng et al., 2021).
Other paradigms: For anomaly detection, specialized characteristic functions minimize the anomaly score over "absent" features, integrating these behaviors into the Shapley framework with efficient heuristics (Takeishi et al., 2020). In probabilistic settings, attribution may be cast as inferring the posterior over per-feature perturbations required to restore normalcy, yielding distributions rather than point scores (Idé et al., 2023).

2. Attribution Methods Across Domains

The score-based attribution framework applies to diverse contexts:

Natural Language Processing: Scores are assigned to input tokens to explain LLM predictions. Methods include Shapley Value Sampling (Monte Carlo Shapley on tokens), attention-weight proxies, and Integrated Gradients along embeddings. In low-resource prompt-based and fine-tuned models, Shapley sampling consistently yields explanations that are more aligned with human rationales and more faithful (model performance drops faster when high-attribution tokens are masked) than attention or IG, independent of base architecture or shot count (Zhou et al., 2024).
Vision: Pixel or patch-based attributions are scored to explain classifier outputs. Deletion or insertion protocols (e.g., AOPC, IDSDS) measure output drops from masking top-attributed regions. Intrinsically explainable models (e.g., BagNet-33) achieve higher attribution faithfulness than standard DNNs, and evaluation metrics that align train and test distributions (IDSDS) permit fair inter-model comparisons (Hesse et al., 2024).
Data Management: Scores such as responsibility, Shapley, Banzhaf, and causal effect index the relevance of tuples to query answers, under formal definitions tied to actual cause, probabilistic interventions, and game-theoretic marginal contributions (Azua et al., 18 Mar 2025, Bertossi, 2023). The alignment or misalignment of these different scores can be characterized in terms of query structure and the presence of exogenous tuples.
Anomaly Detection: Shapley-value-based attribution of anomaly scores is tailored by defining characteristic functions via local minimization around the to-be-explained point, with further heuristics to ensure computational tractability and fidelity (Takeishi et al., 2020). Generative perturbation analysis extends this by providing posteriors over per-feature attributions in regression settings, explicitly addressing deviation sensitivity and uncertainty (Idé et al., 2023).
Generative Models: In diffusion (score-based) generative models, data attribution is performed not by loss-based surrogates, but by direct comparisons of predicted score distributions (e.g., Diffusion Attribution Score, DAS), using first-order Taylor approximations and influence functions to efficiently calculate a training sample's contribution to outputs (Lin et al., 2024).
Boundary-based and Submodular Attribution: Recent developments include boundary-based attribution (e.g., MFABA), which accumulates attributions along the gradient ascent path to a decision boundary, offering faithfulness and order-of-magnitude acceleration over IG (Zhu et al., 2023), and ensemble methods that learn a deep submodular set function to aggregate and regularize underlying attribution maps, improving specificity and robustness (Manupriya et al., 2021).

3. Evaluation Protocols and Metrics

Score-based attribution analysis relies on two central classes of evaluation:

Plausibility (Human Agreement): Quantifies alignment of top-attributed features with human-annotated rationales, using metrics such as average precision (AP) or rank correlations. This measures how explanations resemble human reasoning, but does not guarantee faithfulness to the true model logic (Zhou et al., 2024, Ju et al., 2021).
Faithfulness (Model Sensitivity): Operationalizes the hypothesis that important features, as measured by attributions, are those whose removal most degrades model performance. Faithfulness is quantified via area under perturbation curves (AOPC, AUC), insertion/deletion protocols, or rank correlations between predicted importance and actual output drop (IDSDS in vision) (Zhou et al., 2024, Hesse et al., 2024).

Perturbation strategies and target metrics must be chosen carefully, as many evaluation setups contain logic traps—e.g., circularity when a perturbation metric becomes the "true" attribution, mismatches between human and model reasoning, and the influence of adversarial or out-of-domain inputs on attributions' stability (Ju et al., 2021).

Table: Common Attribution Evaluation Metrics

Metric	What It Measures	Typical Application
Average Precision	Overlap with human 'rationales'	Plausibility (NLP)
AOPC / AUC	Model output drop when top features masked	Faithfulness (vision)
IDSDS	Rank corr. of attribution vs. output change	Inter-model (vision)

Rigorous evaluation further requires benchmarks that satisfy functional mapping invariance (don't retrain or alter the model), input distribution invariance, ground-truth verifiability, and metric sensitivity—these criteria are met in recent backdoor-based evaluation frameworks (Yang et al., 2024).

4. Theoretical and Practical Considerations

Alignment and Axiomatic Properties: The alignment of different attribution scores (e.g., responsibility, Shapley, Banzhaf) is highly query-dependent in data management and conditional on the use of exogenous or endogenous inputs (Azua et al., 18 Mar 2025). In deep models, axiomatically justified methods (e.g., Shapley, IG, expected gradients) attain superior fidelity by completeness (all effects assigned), dummy (no spurious attribution), and symmetry/linearity (Jiang et al., 2023, Deng et al., 2023, Deng et al., 2021). However, strict adherence to axioms can be overly restrictive in practical settings, motivating compositional and measure-theoretic frameworks for constructing new attributions (Taimeskhanov et al., 30 May 2025).
Computational Scalability: Direct computation of Shapley values and interaction indices scales exponentially in feature dimension, but is made tractable via Monte Carlo sampling, marginal-averaging heuristics, subset pruning, and by leveraging decomposable model encodings (e.g., deterministic circuits for classifiers) (Jiang et al., 2023, Bertossi, 2023). Deep submodular set functions can be learned efficiently and enable budgeted, context-aware attributions (Manupriya et al., 2021).
Score Robustness and Specificity: Good attributions are specific (focused on the smallest contextually critical set), robust to input noise or adversarial changes (noise-stable or Lipschitz), and selective (discounting redundancy) (Manupriya et al., 2021). Probabilistic anomaly attribution (Idé et al., 2023) and submodular learning mechanisms both address the challenges of contextual uncertainty and redundancy in assigning scores.

5. Limitations, Logic Traps, and Best Practices

Evaluation Pitfalls: Attribution performance under common benchmarks can be confounded by logic traps, including (1) plausibility ≠ faithfulness, (2) circular perturbation-ground-truth dependencies, and (3) instability reflecting genuine model reasoning changes rather than attribution method unreliability (Ju et al., 2021).
Benchmark Design: Verifiable and high-fidelity benchmarks require alignment of input and training distributions, a known ground-truth attribution (e.g., via synthetic triggers in backdoor-based setups), and sensitive, per-feature metrics. Only a subset of existing benchmarks (e.g., BackX) meet all such criteria (Yang et al., 2024).
Practical Recommendations:
- Prefer methods that satisfy completeness and correct allocation principles (e.g., IG, Shapley, Expected Gradients) for empirical fidelity (Deng et al., 2021, Deng et al., 2023).
- For low-resource or prompt-based LLMs, Shapley Value Sampling provides the highest plausibility and faithfulness (Zhou et al., 2024).
- In image attribution, evaluate raw attribution values using in-domain single-deletion rank correlation (IDSDS) and favor architectures designed for interpretability when high attribution quality is required (Hesse et al., 2024).
- In anomaly and generative data settings, tailor the characteristic function or influence metric to reflect the domain-specific notion of data contribution (e.g., direct score divergence, local completion) (Takeishi et al., 2020, Lin et al., 2024).

6. Extensions, Variants, and Domain-Specific Innovations

Submodular and Boundary-based Attribution: Deep submodular functions learn ensemble attribution scores, addressing redundancy and achieving higher discriminative capacity across vision and medical datasets (Manupriya et al., 2021). Boundary-based attributions (e.g., MFABA) perform path-integral accumulation along the adversarial direction and yield both strong faithfulness and computational acceleration (Zhu et al., 2023).
Unified Linear Algebraic/Möbius/Taylor Viewpoints: The Weighted Möbius Score framework (Jiang et al., 2023) subsumes most attribution methods as linear reweightings over Möbius interaction dividends, enabling principled design and efficient analysis via vector space methods. The Taylor interaction frameworks further clarify allocation of both independent and interaction effects and establish completeness and fair-share criteria as key to faithful attributions (Deng et al., 2023, Deng et al., 2021).
Feature Attribution from First Principles: By constructing attributions for indicator functions and composing via linearity and continuity, all standard methods—including Shapley, Integrated Gradients, and partial dependence—arise as special cases of integration against appropriately chosen measures, enabling closed-form attributions for piecewise-linear networks and novel optimization strategies for metric-based feature selection (Taimeskhanov et al., 30 May 2025).
Score-based Attribution in Data Management: Actual causality, tuple responsibility, and Shapley-based scores define and align attribution in databases and explainable machine learning, with complexity and tractability determined by the structure of queries and by the compilation of classifiers into decomposable circuits (Bertossi, 2023, Azua et al., 18 Mar 2025).

7. Research Directions and Open Challenges

Cross-domain Generalization: Extension of score-based attribution frameworks and evaluation metrics from image and text to settings such as generative models (e.g., diffusion), database management, and black-box anomaly detection remains an area of active research.
Higher-order Interactions and Global Explanations: Principled attribution of multifeature interactions and global importance across datasets or model classes challenges current scalability and interpretability. Approaches based on Taylor expansions, Möbius weights, and submodular aggregation offer promising directions.
Adversarial Robustness and Counterfactual Explanations: Probing attribution method stability under input perturbation, adversarial attack, or distributional shift is essential for reliable deployment and for connecting explanations to true model reasoning.
Evaluation Fidelity and Benchmark Quality: Ongoing work is needed to construct benchmarks satisfying functional, distributional, verifiability, and sensitivity requirements, and to develop domain-appropriate ground truth or indirect metrics when exact attribution is unattainable.
Optimization of Attributions: Explicit methods for tuning or learning the scoring measure (e.g., submodular function learning, measure-based composition, or reference optimization) can directly maximize desired evaluation metrics such as recall, precision, or class-specific accuracy (Taimeskhanov et al., 30 May 2025, Manupriya et al., 2021).

Score-based attribution analysis has established itself as a powerful and theoretically structured paradigm for interpreting model predictions, understanding data influence, and guiding auditing or debugging in complex systems. The field is characterized by rigorous mathematical underpinnings, evolving domain-specific extensions, and ongoing methodological scrutiny regarding evaluation correctness and real-world utility.