Statistical Parity Difference (SPD)

Updated 13 April 2026

SPD is a fairness metric that measures differences in positive prediction rates between groups defined by sensitive attributes, with SPD=0 indicating parity.
It computes the gap between the rates of favorable outcomes for unprivileged and privileged groups, serving as a cornerstone in assessing algorithmic fairness.
SPD informs both regulatory frameworks and technical fairness adjustments, and extends to causal decomposition and privacy-preserving setups in empirical studies.

Statistical Parity Difference (SPD) is a foundational metric in the measurement of group fairness for supervised learning. SPD quantifies disparities in positive prediction rates between groups defined by a sensitive attribute. As a cornerstone of formal fairness definitions, SPD features prominently in both regulatory frameworks and the literature on algorithmic and data transparency.

1. Formal Definition and Properties

Statistical Parity (SP), also termed Demographic Parity, requires a classifier’s positive-prediction rate to be equal across groups defined by a protected attribute $S$ (often with $S=1$ privileged and $S=0$ unprivileged). In notational terms:

$\Pr(\hat Y = 1 \mid S = 1) = \Pr(\hat Y = 1 \mid S = 0)$

Statistical Parity Difference (SPD) measures deviation from parity, and is expressed as:

$\mathrm{SPD} = \Pr(\hat Y = 1 \mid S = 0) - \Pr(\hat Y = 1 \mid S = 1)$

Alternatively, some conventions swap group indices. By construction SPD=0 signifies perfect group parity in positive outcomes, while nonzero values indicate imbalance in algorithmic decisions between groups (Bargh et al., 26 Jan 2026, Krchova et al., 2023, Steen et al., 2023, Rychener et al., 2022, Plecko et al., 2023).

In implementations, for data $D$ , classifier $M$ , and group splits $D_s = \{i \in D: S_i = s\}$ :

$\mathrm{SPD}(M; D) = \frac{1}{|D_0|} \sum_{i\in D_0} M(x_i) - \frac{1}{|D_1|} \sum_{i\in D_1} M(x_i)$

2. Relationship to Other Fairness Criteria

SPD encapsulates group-fairness but does not require access to ground-truth labels. Formally, it contrasts with parity criteria that utilize outcome information, such as Equalized Odds (EO), which demands equal true- and false-positive rates across groups:

$\Pr(\hat Y = 1 \mid Y = y, S = 0) = \Pr(\hat Y = 1 \mid Y = y, S = 1), \quad y \in \{0,1\}$

This difference leads to a well-defined incompatibility: if group base-rates ( $S=1$ 0) differ, then unless the classifier is no-better-than-random ( $S=1$ 1), enforcing Equalized Odds necessitates SPD ≠ 0, and vice versa (Bargh et al., 26 Jan 2026). Analytically,

$S=1$ 2

Thus, base-rate imbalance ( $S=1$ 3) precludes simultaneous achievement of SP and EO except for trivial (random) classifiers (Bargh et al., 26 Jan 2026).

3. Generalization and Metric Forms

SPD is a special case of broader distributional metrics. The classical SP criterion corresponds to the Kolmogorov distance between conditional CDFs $S=1$ 4 for each group:

$S=1$ 5

For $S=1$ 6, this supremum is achieved at $S=1$ 7, so

$S=1$ 8

More generally, SP can be situated within the family of Integral Probability Metrics (IPMs), such as Wasserstein, $S=1$ 9-distance, Maximum Mean Discrepancy (MMD), and Energy Distance. This flexibility enables penalization schemes in learning objectives by extending fairness regularization beyond Kolmogorov (sup-norm) to smoother, kernel-based, or distributional penalties (Rychener et al., 2022).

4. Causal Decomposition and Legal Interpretations

Recent work decomposes SPD into path-specific causal contrasts: direct, indirect, and spurious/legally permissible contributions. In a structural causal model with protected attribute $S=0$ 0, mediators $S=0$ 1, confounders $S=0$ 2, outcome $S=0$ 3, and predictor $S=0$ 4:

Direct effect: Flow from $S=0$ 5 to $S=0$ 6 not mediated by $S=0$ 7
Indirect effect: Paths mediated by $S=0$ 8
Spurious effect: Resulting from dependency structures among $S=0$ 9

The causal decomposition:

$\Pr(\hat Y = 1 \mid S = 1) = \Pr(\hat Y = 1 \mid S = 0)$ 0

This unpacking of SPD is instrumental in aligning technical fairness criteria with legal doctrines such as disparate treatment (ban on direct dependence) and disparate impact (ban on unjustified indirect dependence) (Plecko et al., 2023). The notion of "business necessity" further modulates which causal effects are allowed.

5. Implementation and Optimization Practices

Regularizing for low SPD in empirical risk minimization entails practical considerations:

Direct minimization of empirical SPD on minibatches yields biased gradient estimates. Unbiased batch estimators for fairness penalties (e.g., squared $\Pr(\hat Y = 1 \mid S = 1) = \Pr(\hat Y = 1 \mid S = 0)$ 1, MMD) require batch structure ensuring sufficient group representation (Rychener et al., 2022).
Efficient SGD is feasible by constructing unbiased penalties via carefully sampled mini-batches (with explicit correction factors).
In synthetic data generation, enforcing SPD can be achieved via post-processing methods (e.g., quantile-matching transforms) that align group-specific score distributions, with a tunable fairness-accuracy trade-off parameter $\Pr(\hat Y = 1 \mid S = 1) = \Pr(\hat Y = 1 \mid S = 0)$ 2 (Krchova et al., 2023).

Empirical evidence consistently demonstrates that accuracy-fairness trade-offs are dataset- and metric-sensitive, with regularized methods often achieving superior optimization speed and parity control compared to naïve approaches (Rychener et al., 2022, Krchova et al., 2023).

6. Extensions: Threshold Independence and Strong SP

"Strong" statistical parity requires SPD(t) = 0 for all possible classification thresholds $\Pr(\hat Y = 1 \mid S = 1) = \Pr(\hat Y = 1 \mid S = 0)$ 3, not just for a fixed threshold. For continuous-score classifiers $\Pr(\hat Y = 1 \mid S = 1) = \Pr(\hat Y = 1 \mid S = 0)$ 4, this enforces:

$\Pr(\hat Y = 1 \mid S = 1) = \Pr(\hat Y = 1 \mid S = 0)$ 5

This ensures demographic parity at every decision threshold, which is desirable for settings where thresholds may be post hoc or adjusted downstream (Krchova et al., 2023).

7. Differential Privacy and Auditing SPD

Estimation of SPD under privacy constraints can be efficiently realized in settings such as decision trees using the PAFER procedure. PAFER leverages disjoint histogram queries per leaf and per group, uses Laplace mechanism for differential privacy, and preserves high accuracy in SPD estimation with minimal privacy budget consumption (Steen et al., 2023). The error of differentially-private SPD estimation scales inversely with privacy budget $\Pr(\hat Y = 1 \mid S = 1) = \Pr(\hat Y = 1 \mid S = 0)$ 6 and minimally grows with the number of leaves, provided histogram disjointness is exploited.

Algorithm	Mechanism	SDP Estimation Error
PAFER + Laplace	(ε,0)-DP	~0.02 at ε=0.5
PAFER + Exponential	(ε,0)-DP	Bias > 0.2
PAFER + Gaussian	(ε,δ)-DP	Error ~0.25–0.3

Privacy-preserving estimation becomes challenging in small datasets, deep trees, or with multi-group sensitive features due to increased noise and granularity effects.

In sum, the Statistical Parity Difference undergirds much of contemporary algorithmic fairness discourse, offering a tractable, distribution-free measure of group-level allocation. Its precise definition, compatibility with legal doctrines, mathematical links to other fairness criteria, generalization to strong and distributional forms, and auditability in private settings make SPD a persistent focal point for both theoretical research and practical deployment (Rychener et al., 2022, Plecko et al., 2023, Krchova et al., 2023, Bargh et al., 26 Jan 2026, Steen et al., 2023).