Papers
Topics
Authors
Recent
Search
2000 character limit reached

FedFair-100 Benchmark for Federated Fairness

Updated 25 January 2026
  • FedFair-100 is a synthetic tabular benchmark designed to stress-test privacy-preserving and fairness-aware federated learning protocols under extreme non-IID data conditions.
  • It simulates 100 federated clients with imbalanced, realistic demographic distributions, enforcing label–attribute correlations and highlighting fairness challenges.
  • The benchmark measures utility via accuracy, F1-score, AUROC, and evaluates fairness through demographic parity and equalized odds under varying privacy budgets.

FedFair-100 is a synthetic tabular benchmark specifically designed to evaluate privacy-preserving and fairness-aware federated learning protocols under conditions of extreme heterogeneity and non-IID data partitioning. Developed for stress-testing fairness verification schemes, it emulates large-scale demographic variation and label–attribute relationships using statistical calibration to U.S. census distributions. As a primary testbed in CryptoFair-FL, FedFair-100 enables rigorous analysis of demographic parity and equalized odds violations—alongside utility—across privacy budgets, delivering reproducible results for comparison of federated and centralized approaches (Ali et al., 18 Jan 2026).

1. Construction and Data Generation

FedFair-100 comprises one million records (M=1000000M = 1\,000\,000) generated from mixtures of tabular distributions. Parameters, including age, income, and education, are selected to match realistic U.S. census statistics. Each record is a triplet (x,y,a)(x, y, a), where xx denotes the feature vector in Rd\mathbb{R}^d, yy the binary label, and aa the binary protected attribute. Across 100 simulated institutional clients, the prevalence of the protected attribute a=1a = 1 varies uniformly from 5% to 45%, systematically inducing strong non-IID data heterogeneity. Conditional label–attribute probabilities P[y=1a]P[y=1|a] are institution-specific, generating diverse fairness challenges. Preprocessing normalizes features to zero mean and unit variance per institution; binary features are encoded as {0,1}\{0,1\}, categorical features are one-hot encoded, and continuous features are clipped to [5,5][-5, 5] after normalization.

2. Data Partitioning and Client Distribution

FedFair-100 simulates 100 federated learning clients (n=100n = 100), each assigned approximately miM/nm_i \approx M/n samples, perturbed by ±20%\pm 20\% to reflect realistic institutional imbalance. The partitioning is highly non-IID: each client ii is characterized by its protected attribute rate pi=P[a=1][0.05,0.45]p_i = P[a=1] \in [0.05, 0.45] and conditional label–attribute probabilities. This protocol enforces client-level demographic and label correlations, mirroring regulatory and real-world deployment scenarios.

Attribute Range/Type Handling
Client count (nn) 100 Fixed simulation
Records per client 10000±20%\approx 10\,000 \pm 20\% Imbalanced, non-IID
P[a=1]P[a=1] per client [0.05,0.45][0.05, 0.45] Uniform sweep
Label–attribute P[y=1a]P[y=1|a] Varied client-wise Induces fairness variations

3. Feature, Label, and Protected Attribute Spaces

The feature space is d30d \approx 30 dimensions, spanning mixed continuous and categorical tabular variables. The label space is binary: Y={0,1}\mathcal{Y} = \{0, 1\}. The protected attribute A{0,1}A \in \{0, 1\} is explicitly represented for each record and reserved locally at the client level, never centralized—satisfying privacy constraints. The client-wise variation in P[A=1]P[A=1] across institutions is central to the benchmark’s role in evaluating fairness enforcement.

4. Benchmark Tasks, Model Classes, and Evaluation Settings

FedFair-100 assesses binary classification under federated learning with explicit fairness constraints. Standard model classes are logistic regression or small neural networks fθ:Rd[0,1]f_\theta : \mathbb{R}^d \to [0,1]. Evaluation employs both fairness metrics—demographic parity (DP) and equalized odds (EO)—alongside utility measures: classification accuracy, F1-score, and AUROC. Data are split (per client) into 80/20 train/test partitions stratified by label and protected attribute. The protocol stipulates full client participation (q=1q = 1) in each of \sim100 communication rounds (FedAvg) or \sim106 rounds (CryptoFair-FL) to convergence. Privacy budgets ε\varepsilon are swept from 0.25 to 2.0 in increments of 0.25; each experimental condition is repeated in 5 independent runs, reporting mean and standard deviation.

5. Formal Fairness Metric Definitions

FedFair-100 operationalizes explicit definitions for fairness:

Demographic Parity Violation

ΔDP(θ)=P[Y^=1A=0]P[Y^=1A=1]\Delta_{\mathrm{DP}}(\theta) = |P[\hat{Y}=1|A=0] - P[\hat{Y}=1|A=1]|

where Y^=1{fθ(X)>0.5}\hat{Y} = \mathbf{1}\{f_\theta(X) > 0.5\}.

Equalized Odds Violation

ΔEO(θ)=maxy{0,1}P[Y^=1A=0,Y=y]P[Y^=1A=1,Y=y]\Delta_{\mathrm{EO}}(\theta) = \max_{y \in \{0,1\}} |P[\hat{Y}=1|A=0,Y=y] - P[\hat{Y}=1|A=1,Y=y]|

In practice, fairness metrics are computed by aggregating client-wise counts,

Sa,y^=i=1nj=1mi1{aj(i)=a,y^j(i)=y^},a,y^{0,1}S_{a,\hat{y}} = \sum_{i=1}^n \sum_{j=1}^{m_i} \mathbf{1}\{a_j^{(i)}=a,\, \hat{y}_j^{(i)}=\hat{y}\},\quad a,\hat{y}\in \{0,1\}

The decrypted noisy sums S~a,y^\tilde{S}_{a, \hat{y}} yield empirical rates P^a\hat{P}_a for DP, where Δ^DP=P^0P^1\hat{\Delta}_{\mathrm{DP}} = |\hat{P}_0 - \hat{P}_1|; analogous procedures apply for EO.

6. Experimental Protocol Details and Cryptographic Privacy

FedFair-100 adopts an 80/20 stratified train–test split by label and protected attribute. Communication proceeds over 100–106 rounds per run, with no client subsampling; all 100 clients participate per round. CryptoFair-FL utilizes threshold decryption in a kk-of-nn scheme and introduces Laplace noise per client (Δs/ε\Delta_s / \varepsilon, Δs=1\Delta_s = 1) for differential privacy. Privacy accounting leverages advanced composition via the moment accountant mechanism, consuming total privacy budget ε\varepsilon over all rounds.

7. Baseline Protocols and Key Results

FedFair-100 provides comparative reference for several federated learning paradigms:

Baseline ΔDP\Delta_{\mathrm{DP}} (at ε=1.0\varepsilon = 1.0) AUROC
FedAvg 0.182\approx 0.182 0.86\approx 0.86
Local Fair 0.095\approx 0.095 0.84\approx 0.84
SecAgg-NoFair Uncontrolled
CryptoFair-FL $0.033$ $0.85$
Centralized Fair $0.018$ $0.86$

A tradeoff curve for ΔDP\Delta_\mathrm{DP} vs. ε\varepsilon for CryptoFair-FL on FedFair-100 demonstrates monotonic improvement in fairness violation with higher privacy budgets:

  • ε=0.25ΔDP0.182\varepsilon=0.25\rightarrow\Delta_{\mathrm{DP}}\approx0.182
  • ε=0.50ΔDP0.091\varepsilon=0.50\rightarrow\Delta_{\mathrm{DP}}\approx0.091
  • ε=0.75ΔDP0.054\varepsilon=0.75\rightarrow\Delta_{\mathrm{DP}}\approx0.054
  • ε=1.00ΔDP0.033\varepsilon=1.00\rightarrow\Delta_{\mathrm{DP}}\approx0.033
  • ε=1.50ΔDP0.022\varepsilon=1.50\rightarrow\Delta_{\mathrm{DP}}\approx0.022
  • ε=2.00ΔDP0.017\varepsilon=2.00\rightarrow\Delta_{\mathrm{DP}}\approx0.017

Experimental results demonstrate that CryptoFair-FL achieves near-centralized fairness under privacy constraints, with only 2.3× computational overhead, and remains robust to attribute inference (adversarial probability < 0.05) (Ali et al., 18 Jan 2026). The baselines characterize the landscape for federated fairness, privacy, and utility under heterogeneous data splits, establishing FedFair-100 as an authoritative benchmark for privacy–fairness tradeoff analysis in federated learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FedFair-100 Benchmark.