FedFair-100 Benchmark for Federated Fairness

Updated 25 January 2026

FedFair-100 is a synthetic tabular benchmark designed to stress-test privacy-preserving and fairness-aware federated learning protocols under extreme non-IID data conditions.
It simulates 100 federated clients with imbalanced, realistic demographic distributions, enforcing label–attribute correlations and highlighting fairness challenges.
The benchmark measures utility via accuracy, F1-score, AUROC, and evaluates fairness through demographic parity and equalized odds under varying privacy budgets.

FedFair-100 is a synthetic tabular benchmark specifically designed to evaluate privacy-preserving and fairness-aware federated learning protocols under conditions of extreme heterogeneity and non-IID data partitioning. Developed for stress-testing fairness verification schemes, it emulates large-scale demographic variation and label–attribute relationships using statistical calibration to U.S. census distributions. As a primary testbed in CryptoFair-FL, FedFair-100 enables rigorous analysis of demographic parity and equalized odds violations—alongside utility—across privacy budgets, delivering reproducible results for comparison of federated and centralized approaches (Ali et al., 18 Jan 2026).

1. Construction and Data Generation

FedFair-100 comprises one million records ( $M = 1\,000\,000$ ) generated from mixtures of tabular distributions. Parameters, including age, income, and education, are selected to match realistic U.S. census statistics. Each record is a triplet $(x, y, a)$ , where $x$ denotes the feature vector in $\mathbb{R}^d$ , $y$ the binary label, and $a$ the binary protected attribute. Across 100 simulated institutional clients, the prevalence of the protected attribute $a = 1$ varies uniformly from 5% to 45%, systematically inducing strong non-IID data heterogeneity. Conditional label–attribute probabilities $P[y=1|a]$ are institution-specific, generating diverse fairness challenges. Preprocessing normalizes features to zero mean and unit variance per institution; binary features are encoded as $\{0,1\}$ , categorical features are one-hot encoded, and continuous features are clipped to $[-5, 5]$ after normalization.

2. Data Partitioning and Client Distribution

FedFair-100 simulates 100 federated learning clients ( $n = 100$ ), each assigned approximately $m_i \approx M/n$ samples, perturbed by $\pm 20\%$ to reflect realistic institutional imbalance. The partitioning is highly non-IID: each client $i$ is characterized by its protected attribute rate $p_i = P[a=1] \in [0.05, 0.45]$ and conditional label–attribute probabilities. This protocol enforces client-level demographic and label correlations, mirroring regulatory and real-world deployment scenarios.

Attribute	Range/Type	Handling
Client count ( $n$ )	100	Fixed simulation
Records per client	$\approx 10\,000 \pm 20\%$	Imbalanced, non-IID
$P[a=1]$ per client	$[0.05, 0.45]$	Uniform sweep
Label–attribute $P[y=1\|a]$	Varied client-wise	Induces fairness variations

3. Feature, Label, and Protected Attribute Spaces

The feature space is $d \approx 30$ dimensions, spanning mixed continuous and categorical tabular variables. The label space is binary: $\mathcal{Y} = \{0, 1\}$ . The protected attribute $A \in \{0, 1\}$ is explicitly represented for each record and reserved locally at the client level, never centralized—satisfying privacy constraints. The client-wise variation in $P[A=1]$ across institutions is central to the benchmark’s role in evaluating fairness enforcement.

4. Benchmark Tasks, Model Classes, and Evaluation Settings

FedFair-100 assesses binary classification under federated learning with explicit fairness constraints. Standard model classes are logistic regression or small neural networks $f_\theta : \mathbb{R}^d \to [0,1]$ . Evaluation employs both fairness metrics—demographic parity (DP) and equalized odds (EO)—alongside utility measures: classification accuracy, F1-score, and AUROC. Data are split (per client) into 80/20 train/test partitions stratified by label and protected attribute. The protocol stipulates full client participation ( $q = 1$ ) in each of $\sim$ 100 communication rounds (FedAvg) or $\sim$ 106 rounds (CryptoFair-FL) to convergence. Privacy budgets $\varepsilon$ are swept from 0.25 to 2.0 in increments of 0.25; each experimental condition is repeated in 5 independent runs, reporting mean and standard deviation.

5. Formal Fairness Metric Definitions

FedFair-100 operationalizes explicit definitions for fairness:

Demographic Parity Violation

$\Delta_{\mathrm{DP}}(\theta) = |P[\hat{Y}=1|A=0] - P[\hat{Y}=1|A=1]|$

where $\hat{Y} = \mathbf{1}\{f_\theta(X) > 0.5\}$ .

Equalized Odds Violation

$\Delta_{\mathrm{EO}}(\theta) = \max_{y \in \{0,1\}} |P[\hat{Y}=1|A=0,Y=y] - P[\hat{Y}=1|A=1,Y=y]|$

In practice, fairness metrics are computed by aggregating client-wise counts,

$S_{a,\hat{y}} = \sum_{i=1}^n \sum_{j=1}^{m_i} \mathbf{1}\{a_j^{(i)}=a,\, \hat{y}_j^{(i)}=\hat{y}\},\quad a,\hat{y}\in \{0,1\}$

The decrypted noisy sums $\tilde{S}_{a, \hat{y}}$ yield empirical rates $\hat{P}_a$ for DP, where $\hat{\Delta}_{\mathrm{DP}} = |\hat{P}_0 - \hat{P}_1|$ ; analogous procedures apply for EO.

6. Experimental Protocol Details and Cryptographic Privacy

FedFair-100 adopts an 80/20 stratified train–test split by label and protected attribute. Communication proceeds over 100–106 rounds per run, with no client subsampling; all 100 clients participate per round. CryptoFair-FL utilizes threshold decryption in a $k$ -of- $n$ scheme and introduces Laplace noise per client ( $\Delta_s / \varepsilon$ , $\Delta_s = 1$ ) for differential privacy. Privacy accounting leverages advanced composition via the moment accountant mechanism, consuming total privacy budget $\varepsilon$ over all rounds.

7. Baseline Protocols and Key Results

FedFair-100 provides comparative reference for several federated learning paradigms:

Baseline	$\Delta_{\mathrm{DP}}$ (at $\varepsilon = 1.0$ )	AUROC
FedAvg	$\approx 0.182$	$\approx 0.86$
Local Fair	$\approx 0.095$	$\approx 0.84$
SecAgg-NoFair	Uncontrolled	—
CryptoFair-FL	$0.033$	$0.85$
Centralized Fair	$0.018$	$0.86$

A tradeoff curve for $\Delta_\mathrm{DP}$ vs. $\varepsilon$ for CryptoFair-FL on FedFair-100 demonstrates monotonic improvement in fairness violation with higher privacy budgets:

$\varepsilon=0.25\rightarrow\Delta_{\mathrm{DP}}\approx0.182$
$\varepsilon=0.50\rightarrow\Delta_{\mathrm{DP}}\approx0.091$
$\varepsilon=0.75\rightarrow\Delta_{\mathrm{DP}}\approx0.054$
$\varepsilon=1.00\rightarrow\Delta_{\mathrm{DP}}\approx0.033$
$\varepsilon=1.50\rightarrow\Delta_{\mathrm{DP}}\approx0.022$
$\varepsilon=2.00\rightarrow\Delta_{\mathrm{DP}}\approx0.017$

Experimental results demonstrate that CryptoFair-FL achieves near-centralized fairness under privacy constraints, with only 2.3× computational overhead, and remains robust to attribute inference (adversarial probability < 0.05) (Ali et al., 18 Jan 2026). The baselines characterize the landscape for federated fairness, privacy, and utility under heterogeneous data splits, establishing FedFair-100 as an authoritative benchmark for privacy–fairness tradeoff analysis in federated learning.

Markdown Report Issue Upgrade to Chat

References (1)

Privacy-Preserving Federated Learning with Verifiable Fairness Guarantees (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FedFair-100 Benchmark.

FedFair-100 Benchmark for Federated Fairness

1. Construction and Data Generation

2. Data Partitioning and Client Distribution

3. Feature, Label, and Protected Attribute Spaces

4. Benchmark Tasks, Model Classes, and Evaluation Settings

5. Formal Fairness Metric Definitions

6. Experimental Protocol Details and Cryptographic Privacy

7. Baseline Protocols and Key Results

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

FedFair-100 Benchmark for Federated Fairness

1. Construction and Data Generation

2. Data Partitioning and Client Distribution

3. Feature, Label, and Protected Attribute Spaces

4. Benchmark Tasks, Model Classes, and Evaluation Settings

5. Formal Fairness Metric Definitions

6. Experimental Protocol Details and Cryptographic Privacy

7. Baseline Protocols and Key Results

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research