Fairness-Preserving Label Propagation

Updated 30 May 2026

Fairness-preserving label propagation is a graph-based method that diffuses label corrections across data to reduce bias and improve predictive accuracy.
It integrates fairness constraints, such as demographic parity and worst-group accuracy, into techniques for applications like ASR rescoring and fraud detection.
Empirical studies show significant improvements in metrics like WER, AUC, and worst-group accuracy by effectively mitigating label noise and group disparities.

Fairness-preserving label propagation encompasses a family of algorithms that combine graph-based label correction with explicit fairness constraints to mitigate group-level disparities in machine learning systems, particularly under noisy or biased supervision. By leveraging the geometry of a data graph—constructed from pairwise similarities or latent representations—these systems spread information, adjust labels, or rescore candidate outputs in a manner that aims to both enhance overall predictive accuracy and reduce performance disparities across demographic or latent subgroups.

1. Principles and Motivation

Fairness-preserving label propagation is motivated by persistent evidence that machine learning systems display disparate error rates across demographic groups, especially in settings such as speech recognition or binary classification where label noise or spurious correlations further exacerbate inequity. The objectives are twofold: (i) robustly correct or assign labels based on local and global structure in the data, and (ii) explicitly balance or regularize group-wise parity metrics, such as worst-group error or demographic parity. The central tool is a data graph where nodes represent samples (utterances, instances, etc.) and edges encode some notion of affinity, enabling label diffusion or collaborative scoring. Empirical studies demonstrate that incorporating fairness-aware propagation into label correction, rescoring, or last-layer retraining pipelines can substantially close accuracy gaps between majority and minority groups without explicit demographic supervision in all components (Tankasala et al., 2023, Sulaiman et al., 18 Jun 2025, Stromberg et al., 2024).

2. Graph Construction and Similarity Metrics

The effectiveness of label propagation is contingent on the construction of an informative graph. Distinct methodologies are used in various domains:

Acoustic-based Graphs: In ASR rescoring, nodes represent utterances and edge weights derive from acoustic similarity. Specifically, Tankasala et al. construct a fully connected graph among overlapping utterances, where edge affinity is a thresholded, length-normalized dependent DTW distance between final-layer RNN-T frame embeddings. The binary affinity matrix $W$ is symmetrized and normalized to produce the propagation operator $S = \Delta^{-1/2}W\Delta^{-1/2}$ , where $\Delta$ is the degree matrix (Tankasala et al., 2023).
kNN Graphs in Feature Space: For tabular or vision datasets, a $k$ -nearest neighbor (kNN) graph is constructed using distances in the raw or latent feature space. In GFLC, edge weights are set by inverse Euclidean distance, with Ricci-flow-based reweighting to regularize graph geometry over several iterations (Sulaiman et al., 18 Jun 2025). In domain-agnostic correction, nearest neighbor graphs are built on pretrained representation embeddings, with uniform row-stochastic propagator $P=V_k/k$ (Stromberg et al., 2024).

The choice of similarity metric and strategy for edge weighting or sparsification (e.g. hard thresholds, RBF kernels, curvature-regularized updates) determines the locality and robustness of label diffusion.

3. Fairness-aware Label Propagation Algorithms

Multiple algorithmic instantiations implement fairness-preserving label propagation:

3.1 Soft Label Smoothing and Collaborative Rescoring

In cross-utterance ASR, the N-best hypotheses from each utterance are gathered into a candidate label set. The initial probability assignments are propagated via $Y^{(t+1)} = \alpha S Y^{(t)} + (1-\alpha) Y^{(0)}$ , with smoothing factor $\alpha$ , until convergence ( $Y^{(\infty)}$ ). The final output for each utterance is the maximizer of $Y^{(\infty)}_{i,*}$ . This collaborative diffusion encourages acoustically similar utterances—often across underrepresented accents—to reinforce non-majority label paths, empirically reducing accent-driven WER disparities without explicit accent-group fine-tuning (Tankasala et al., 2023).

3.2 Combined Score and Demographic Parity-driven Correction

GFLC introduces an explicit scoring function integrating margin-based model uncertainty, a Ricci-flow-Laplacian smoothness penalty, and an incremental fairness gain on demographic parity. For each node $i$ :

$S = \Delta^{-1/2}W\Delta^{-1/2}$ 0

where $S = \Delta^{-1/2}W\Delta^{-1/2}$ 1 prioritizes ambiguous (low-confidence) labels, $S = \Delta^{-1/2}W\Delta^{-1/2}$ 2 is the curvature-regularized Laplacian term, and $S = \Delta^{-1/2}W\Delta^{-1/2}$ 3 quantifies the shift in demographic parity from hypothetically flipping $S = \Delta^{-1/2}W\Delta^{-1/2}$ 4 (Sulaiman et al., 18 Jun 2025). Top-scoring candidates are then flipped subject to prevalence constraints, and the final classifier is retrained on corrected labels.

3.3 Latent kNN-based Majority-vote Label Spreading

In domain-agnostic last-layer retraining, each example’s observed label is iteratively replaced by a majority vote among its $S = \Delta^{-1/2}W\Delta^{-1/2}$ 5 nearest neighbors’ labels in latent space. This correction is performed independently of group membership, after which standard subgroup-fair two-stage last-layer retraining (e.g., RAD, SELF) is applied to the newly assigned labels. The method is robust under symmetric label noise, restoring worst-group accuracy without explicit use of group labels in the correction phase (Stromberg et al., 2024).

4. Fairness Metrics and Theoretical Considerations

Fairness is quantified by diverse group-based metrics adapted to the particular context:

Group-specific WER/SER and $S = \Delta^{-1/2}W\Delta^{-1/2}$ 6: In ASR evaluation, word error rate (and sentence error rate) is reported per accent group, and the fairness gap is $S = \Delta^{-1/2}W\Delta^{-1/2}$ 7 (Tankasala et al., 2023).
Demographic Parity Ratio: GFLC targets balanced positive prediction rates between protected groups, with the main metric being

$S = \Delta^{-1/2}W\Delta^{-1/2}$ 8

A value of 1 indicates perfect parity (Sulaiman et al., 18 Jun 2025).

Worst-group Accuracy (WGA): For subgroup fairness without group labels, WGA is defined as the minimum accuracy across all latent (possibly hidden) groups, with the goal that the last-layer retrained model maximizes this worst-case subgroup metric (Stromberg et al., 2024).

Theoretical guarantees focus primarily on the noise-robustness of the propagation phase (e.g., bounds for $S = \Delta^{-1/2}W\Delta^{-1/2}$ 9-NN under symmetric noise) and the empirical effect of graph denoising via Ricci flow (Stromberg et al., 2024, Sulaiman et al., 18 Jun 2025). Formal fairness guarantees are generally not provided but are evaluated empirically.

5. Empirical Results and Applications

Experimental findings across domains substantiate the utility of fairness-aware label propagation:

ASR Rescoring: On the VCTK corpus, cross-utterance label propagation reduces baseline WER from 9.99% to 5.64%, with the largest relative gains for non-American accent groups (e.g., Indian: 13.26%→7.94%), nearly halving the WER gap across accents. The method also reduces SER by over 40% (Tankasala et al., 2023).
Bank Fraud Detection (GFLC): With up to 20% group-specific label noise, GFLC achieves AUCs of 0.799–0.874 and maintains demographic parity ratios near 1.0, outperforming baseline methods whose fairness ratios degrade at higher noise rates (Sulaiman et al., 18 Jun 2025).
Domain-agnostic Fair Corrections: kNN-based label spreading, as a preprocessing step to RAD/SELF, restores state-of-the-art WGA under significant (30–40%) symmetric label noise on both vision and text datasets. The procedure is modular and does not require explicit demographic labels at training time (Stromberg et al., 2024).

Applications extend from speech recognition and fraud detection to any setting where label noise and group imbalance co-occur, including healthcare, social media, and document classification.

6. Algorithmic Summaries

The following table illustrates principal features of prominent fairness-preserving label propagation methods:

Approach	Graph Type	Fairness Objective
ASR-Graph-LP (Tankasala et al., 2023)	Acoustic DTW-based	Reduce accent WER gap
GFLC (Sulaiman et al., 18 Jun 2025)	Feature kNN + Ricci	Demographic parity
kNN Label Spreading (Stromberg et al., 2024)	Latent kNN	Worst-group accuracy

GFLC uniquely combines margin-based uncertainty, Laplacian smoothness, and explicit demographic parity gain in a unified correction score. The ASR rescoring method uses string-valued labels and softmax initialization but achieves fairness via graph smoothing across accent boundaries. Latent kNN label propagation achieves group-agnostic fairness improvement by majority label correction in representation space.

7. Discussion, Generalizations, and Limitations

Fairness-preserving label propagation advances the state of the art for equitable learning under noisy or incomplete supervision. By diffusing label information across a similarity graph and biasing correction toward parity, these methods can bridge accuracy gaps in both explicitly labeled (e.g., accent or age) and latent group settings. Limitations include the absence of formal convergence or group-fairness guarantees, and computational costs associated with constructing and updating large kNN graphs, especially in high-dimensional or large-scale datasets. No explicit domain adaptation is performed, but thresholds and propagation parameters are often tuned on development splits. A plausible implication is that propagating labels via graphs in other representation spaces (e.g., images, text) and combining with downstream fairness-aware objectives could extend these gains to broader modalities (Tankasala et al., 2023, Sulaiman et al., 18 Jun 2025, Stromberg et al., 2024).

Markdown Report Issue Upgrade to Chat

References (3)

Cross-utterance ASR Rescoring with Graph-based Label Propagation (2023)

GFLC: Graph-based Fairness-aware Label Correction for Fair Classification (2025)

Label Noise Robustness for Domain-Agnostic Fair Corrections via Nearest Neighbors Label Spreading (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fairness-Preserving Label Propagation.