Weighted Shapley Values Explained

Updated 2 March 2026

Weighted Shapley values are a generalization of the classical Shapley value that assigns nonuniform weights based on coalition size, precedence, and agent priorities.
They extend semivalue frameworks by encoding domain-specific fairness and interpretability constraints, often using parametric distributions like Beta for weight assignment.
Applications span feature attribution, data valuation, and voting power analysis, achieving enhanced empirical performance and computational efficiency in real-world tasks.

A weighted Shapley value generalizes the classical Shapley value by permitting nonuniform weighting of marginal contributions according to coalition size, precedence constraints, or agent-specific priorities. Motivated by limitations of treating all coalitions and contributors as equivalently significant—a constraint often violated in practical feature attribution, data valuation, or decision-making—weighted Shapley frameworks encode cardinality- or hierarchy-dependent preferences directly into the distribution over orderings or subsets. This yields strictly more flexible and context-aware credit or importance allocations that can be tailored to diverse fairness, interpretability, or epistemic requirements across voting, explainable AI, and related domains.

1. Definitions and Generalization Beyond Uniform Shapley

The classical Shapley value assigns to player $i$ :

$\phi_i(v) = \sum_{S\subseteq N\setminus\{i\}} \frac{|S|!\,(n-|S|-1)!}{n!} [v(S\cup\{i\})-v(S)]$

where $v(\cdot)$ is a cooperative game or value function. This averaging takes place over all possible orderings and, thus, all coalition sizes with cardinality-dependent multinomial weights reflecting permutation counts. The uniformity assumption treats each subset size equally by scaling the marginal contributions accordingly.

Weighted Shapley values replace this uniform weighting with arbitrary nonnegative weights $w_k$ that depend on the size $k$ of the coalition:

$\phi_i^{w} = \sum_{S\subseteq N\setminus\{i\}} w_{|S|} [v(S\cup\{i\}) - v(S)]$

A practical reparametrization uses a parametric family for $w_k$ such as Beta-distributions over $k$ , or introduces further structure such as precedence constraints (partial orders or DAGs) and node- or edge-specific priorities or strengths (Kwon et al., 2022, Panda et al., 9 Mar 2025, Lee et al., 10 Feb 2026, Forré et al., 7 Oct 2025). This results in a landscape of weighted generalizations, each targeting particular domain desiderata.

2. Semivalue and Priority Frameworks

Weighted Shapley values are a subclass of semivalues—set functions where weights on coalition size are arbitrary nonnegative numbers, often normalized for interpretability or computational tractability. The Kolmogorov–Shapley form enforces symmetry and null-player properties, but relaxes efficiency, permitting the sum of attributions $\sum_i \phi_i$ to deviate from total value in line with domain priorities (Kwon et al., 2022).

Priority-aware variants extend these ideas to incorporate hard precedence (via partial orders encoding, e.g., causality or derivation relationships) and soft priorities (player-specific weights). The Priority-Aware Shapley Value (PASV) sets the distribution over orderings by a combination of linear extension enumeration and a multiplicative process that, at each insertion, samples among the currently maximal eligible players in accordance with their priority weights (Lee et al., 10 Feb 2026). This unifies and generalizes both the precedence structure (Precedence Shapley Value) and classical weighted random orderings (Weighted Shapley Value of Kalai–Nowak).

3. Theoretical Properties and Characterizations

Weighted Shapley values, and their further generalizations, satisfy several important game-theoretic and algebraic properties:

Linearity: Attribution is linear in the value function.
Symmetry: Players in symmetric situations receive equal value.
Null-Player: An agent contributing nothing is assigned zero.
(Relaxed) Efficiency: Weighted variants, except in special normalizations, do not generally enforce $\sum_i \phi_i = v(N) - v(\emptyset)$ .

Uniqueness results are governed by further axioms; for example, for weighted Shapley values on weighted directed acyclic multigraphs (DAMGs), the addition of "weak elements" (ignoring vertices with zero synergy contribution) and "flat hierarchy" (uniformity in the case of level-wise graphs) axioms provides uniqueness (Forré et al., 7 Oct 2025). PASV is uniquely determined by axioms including State-Choice Factorization, Weight-Proportionality, and Equal-Weight Uniformity (Lee et al., 10 Feb 2026).

Weighted Shapley values also admit a weighted least-squares characterization: the unique vector of attributions minimizes

$\min_{\psi\in\mathbb{R}^n}\ \sum_{S\subseteq N} w_{|S|}\ [v(S) - \sum_{i\in S}\psi_i]^2$

where $p(S) \propto w_{|S|}$ can be interpreted as a sampling measure over subsets (Panda et al., 9 Mar 2025, Fumagalli et al., 2024).

4. Computational Methods and Algorithms

The high computational cost of exact Shapley value evaluation persists under weighted extensions— $O(2^n)$ for $n$ agents, unless the value function structure or weight system admits special simplifications.

Several strategies have been developed:

WeightedSHAP Algorithm: For feature attribution, Monte Carlo estimation of marginal contributions $\Delta_j(x_i)$ is performed for all features $i$ and coalition sizes $j$ , then candidate weights $w$ are searched, typically as a finite grid (e.g., over Beta family), to maximize task-defined utility metrics. Only the outer loop and weighted averaging differ from classical SHAP (Kwon et al., 2022).
Amortized/Learned Approximators (FW-Shapley): Deep neural architectures are trained to minimize weighted least-squares loss matching the weighted Shapley solution, enabling real-time attributions at inference (Panda et al., 9 Mar 2025).
Priority-Aware MCMC Estimation: For PASV and semivalues on large or DAG-structured sets, adjacent-swap Metropolis–Hastings chains sample orderings consistent with precedence and weighting constraints, producing unbiased samples for marginal estimation (Lee et al., 10 Feb 2026).
Special Cases for Voting Games: For weighted voting with random or structured weights, renewal-process or balls-and-bins–inspired dynamic programming enables exact or asymptotically precise computation of expected weighted Shapley indices, particularly for quotas away from singularities (Filmus et al., 2016, Oren et al., 2014).
Vector-Valued and Hierarchical Settings: For values defined on general weighted DAGs and for vector-valued functions, computations rely on Möbius inversion to synergies, followed by projection and reweighting operations along paths, counting multiplicities or aggregating strengths as appropriate (Forré et al., 7 Oct 2025).

5. Applications: Feature Attribution, Data Valuation, Voting, and Beyond

Feature Attribution: WeightedSHAP assigns greater or lesser importance to marginal contributions depending on coalition informativeness, directly addressing cases where classical SHAP misranks features in regression or classification tasks, or when strong correlations exist (Kwon et al., 2022, Panda et al., 9 Mar 2025). For richer models, the WLS and higher-order SII extensions enable accurate assessment of feature interactions using weighted sampling schemes (Fumagalli et al., 2024).
Data Valuation: Weighted Shapley values are used to apportion credit among data points in training set construction. Algorithms such as Weighted KNN Shapley admit efficient quadratic-time computation with hard-label constraints and discretized weights, preserving the axiomatic fairness of Shapley allocations, even in large-scale settings (Wang et al., 2024). The PASV framework permits analyst control over data trust, risk, or hierarchy (Lee et al., 10 Feb 2026).
Voting Power Analysis: In multi-agent weighted voting games, stochastic generation of weights and quota selection crucially affect the Shapley-based power distribution. Weighted Shapley value analysis clarifies the emergence and structure of power plateaus, sensitivity to quotas, and linear versus square-root allocation rules under correlated preferences (Filmus et al., 2016, Kurz et al., 2016, Oren et al., 2014).
Hierarchical and Vector-Valued Domains: Extensions to arbitrary DAGs and abelian group–valued functions enable attribution in hierarchical systems, mereologies, and vector-valued explainable AI models (Forré et al., 7 Oct 2025).

6. Comparative Empirical Results

Empirical benchmarks reveal that weighted Shapley methods, when fitted to data or domain priors:

Consistently outperform classical SHAP under metrics such as prediction recovery (AUP), inclusion/exclusion AUC or MSE, and label-noise recovery, often with fewer features required (Kwon et al., 2022, Panda et al., 9 Mar 2025).
Real-time amortized methods (FW-Shapley) yield markedly higher inclusion AUC and respect least-squares optimality in expectation, achieving 27% improvement over other learned Shapley attribution baselines in vision tasks (Panda et al., 9 Mar 2025).
Efficient data valuation via weighted KNN Shapley methods attains 10⁴–10⁶× speed-ups compared to naive combinatorial enumeration and robustly ranks points in mislabel and data selection tasks (Wang et al., 2024).
PASV-based attributions more faithfully represent domain knowledge, precedence, and soft priorities than either uniform or partitioned-weight Shapley values. Sensitivity analysis ("priority sweeping") detects and visualizes the stability of attributions with respect to priority weight changes, exposing robust vs. fragile allocation regimes (Lee et al., 10 Feb 2026).
KernelSHAP-IQ for higher-order interactions achieves lower MSE and higher precision-at-top in interaction detection versus prior sampling-based or single-pass WLS variants (Fumagalli et al., 2024).

7. Practical Considerations and Limitations

Computation is dominated by marginal contribution estimation, not weight optimization; grid search over weight parameters is computationally negligible in most applications (Kwon et al., 2022).
Weight-family selection (e.g., Beta-distributions) and downstream utility metric alignment are critical—misalignment leads to suboptimal attributions.
PASV and vector-DAG schemes require explicit or elicited precedence and priority information for effective operation; misspecification or cyclical dependency can degrade interpretability.
Relaxing classical efficiency constraints can cause total attribution to deviate from additive value, though relative ranking (and thus importance ordering) remains robust.
Scalability is not an inherent barrier: quadratic- or subquadratic-time algorithms, amortized neural inference, and efficient MCMC all bring weighted Shapley applications within reach for large $n$ (Kwon et al., 2022, Panda et al., 9 Mar 2025, Wang et al., 2024, Lee et al., 10 Feb 2026).
Direct extension beyond hard-label, discretized-utility settings in data valuation and to other learning frameworks remains open, though the architecture of combinatorial counting and random-order sampling suggests avenues for development (Wang et al., 2024, Panda et al., 9 Mar 2025).

Weighted Shapley values, in sum, constitute a rigorous, unifying, and highly adaptable foundation for apportionment of credit, importance, or responsibility in systems characterized by heterogeneity, dependency, or epistemic imprecision, offering both axiomatic transparency and empirical superiority when compared to uniform Shapley allocations.