FedFair-100 Benchmark for Federated Fairness
- FedFair-100 is a synthetic tabular benchmark designed to stress-test privacy-preserving and fairness-aware federated learning protocols under extreme non-IID data conditions.
- It simulates 100 federated clients with imbalanced, realistic demographic distributions, enforcing label–attribute correlations and highlighting fairness challenges.
- The benchmark measures utility via accuracy, F1-score, AUROC, and evaluates fairness through demographic parity and equalized odds under varying privacy budgets.
FedFair-100 is a synthetic tabular benchmark specifically designed to evaluate privacy-preserving and fairness-aware federated learning protocols under conditions of extreme heterogeneity and non-IID data partitioning. Developed for stress-testing fairness verification schemes, it emulates large-scale demographic variation and label–attribute relationships using statistical calibration to U.S. census distributions. As a primary testbed in CryptoFair-FL, FedFair-100 enables rigorous analysis of demographic parity and equalized odds violations—alongside utility—across privacy budgets, delivering reproducible results for comparison of federated and centralized approaches (Ali et al., 18 Jan 2026).
1. Construction and Data Generation
FedFair-100 comprises one million records () generated from mixtures of tabular distributions. Parameters, including age, income, and education, are selected to match realistic U.S. census statistics. Each record is a triplet , where denotes the feature vector in , the binary label, and the binary protected attribute. Across 100 simulated institutional clients, the prevalence of the protected attribute varies uniformly from 5% to 45%, systematically inducing strong non-IID data heterogeneity. Conditional label–attribute probabilities are institution-specific, generating diverse fairness challenges. Preprocessing normalizes features to zero mean and unit variance per institution; binary features are encoded as , categorical features are one-hot encoded, and continuous features are clipped to after normalization.
2. Data Partitioning and Client Distribution
FedFair-100 simulates 100 federated learning clients (), each assigned approximately samples, perturbed by to reflect realistic institutional imbalance. The partitioning is highly non-IID: each client is characterized by its protected attribute rate and conditional label–attribute probabilities. This protocol enforces client-level demographic and label correlations, mirroring regulatory and real-world deployment scenarios.
| Attribute | Range/Type | Handling |
|---|---|---|
| Client count () | 100 | Fixed simulation |
| Records per client | Imbalanced, non-IID | |
| per client | Uniform sweep | |
| Label–attribute | Varied client-wise | Induces fairness variations |
3. Feature, Label, and Protected Attribute Spaces
The feature space is dimensions, spanning mixed continuous and categorical tabular variables. The label space is binary: . The protected attribute is explicitly represented for each record and reserved locally at the client level, never centralized—satisfying privacy constraints. The client-wise variation in across institutions is central to the benchmark’s role in evaluating fairness enforcement.
4. Benchmark Tasks, Model Classes, and Evaluation Settings
FedFair-100 assesses binary classification under federated learning with explicit fairness constraints. Standard model classes are logistic regression or small neural networks . Evaluation employs both fairness metrics—demographic parity (DP) and equalized odds (EO)—alongside utility measures: classification accuracy, F1-score, and AUROC. Data are split (per client) into 80/20 train/test partitions stratified by label and protected attribute. The protocol stipulates full client participation () in each of 100 communication rounds (FedAvg) or 106 rounds (CryptoFair-FL) to convergence. Privacy budgets are swept from 0.25 to 2.0 in increments of 0.25; each experimental condition is repeated in 5 independent runs, reporting mean and standard deviation.
5. Formal Fairness Metric Definitions
FedFair-100 operationalizes explicit definitions for fairness:
Demographic Parity Violation
where .
Equalized Odds Violation
In practice, fairness metrics are computed by aggregating client-wise counts,
The decrypted noisy sums yield empirical rates for DP, where ; analogous procedures apply for EO.
6. Experimental Protocol Details and Cryptographic Privacy
FedFair-100 adopts an 80/20 stratified train–test split by label and protected attribute. Communication proceeds over 100–106 rounds per run, with no client subsampling; all 100 clients participate per round. CryptoFair-FL utilizes threshold decryption in a -of- scheme and introduces Laplace noise per client (, ) for differential privacy. Privacy accounting leverages advanced composition via the moment accountant mechanism, consuming total privacy budget over all rounds.
7. Baseline Protocols and Key Results
FedFair-100 provides comparative reference for several federated learning paradigms:
| Baseline | (at ) | AUROC |
|---|---|---|
| FedAvg | ||
| Local Fair | ||
| SecAgg-NoFair | Uncontrolled | — |
| CryptoFair-FL | $0.033$ | $0.85$ |
| Centralized Fair | $0.018$ | $0.86$ |
A tradeoff curve for vs. for CryptoFair-FL on FedFair-100 demonstrates monotonic improvement in fairness violation with higher privacy budgets:
Experimental results demonstrate that CryptoFair-FL achieves near-centralized fairness under privacy constraints, with only 2.3× computational overhead, and remains robust to attribute inference (adversarial probability < 0.05) (Ali et al., 18 Jan 2026). The baselines characterize the landscape for federated fairness, privacy, and utility under heterogeneous data splits, establishing FedFair-100 as an authoritative benchmark for privacy–fairness tradeoff analysis in federated learning.