Papers
Topics
Authors
Recent
Search
2000 character limit reached

FairGAN: Fairness-Aware GAN Framework

Updated 21 February 2026
  • FairGAN is a framework for creating synthetic datasets that balance realistic data utility with enforced demographic parity.
  • Its dual-discriminator design and conditional MedGAN-inspired generator jointly optimize data utility, classification utility, and fairness.
  • Empirical tests on the UCI Adult dataset show a 75% reduction in risk difference with only a minimal drop in downstream classification accuracy.

FairGAN is a framework for learning generative models that produce synthetic datasets both statistically similar to the original data and free of discrimination under group fairness notions such as demographic parity. Unlike naive de-biasing or traditional generative adversarial networks (GANs), FairGAN jointly enforces data utility, data fairness, classification utility, and classifier fairness, ensuring that downstream classifiers trained on synthetic data do not reproduce historical disparities (Xu et al., 2018).

1. Formal Problem Setting and Fairness Criteria

Given a dataset D=(X,Y,S)PdataD = (X, Y, S) \sim P_{\text{data}}, with XRnX \in \mathbb{R}^n denoting unprotected features, S{0,1}S \in \{0,1\} a binary protected attribute (e.g., gender or race), and Y{0,1}Y \in \{0,1\} a binary outcome/label, FairGAN aims to learn a generator GG yielding a synthetic dataset D^=(X^,Y^,S^)PG\hat{D} = (\hat{X}, \hat{Y}, \hat{S}) \sim P_G that satisfies four desiderata:

  • Data Utility: PG(X^,Y^)Pdata(X,Y)P_G(\hat{X}, \hat{Y}) \approx P_{\text{data}}(X, Y).
  • Data Fairness: Statistical parity in the synthetic labels, i.e., P(Y^=1S^=1)=P(Y^=1S^=0)P(\hat{Y}=1 \mid \hat{S}=1) = P(\hat{Y}=1 \mid \hat{S}=0).
  • Classification Utility: A classifier η\eta trained on (X^,Y^)(\hat{X}, \hat{Y}) yields high accuracy on real XX.
  • Classification Fairness: Classifier η\eta achieves P(η(X)=1S=1)=P(η(X)=1S=0)P(\eta(X)=1 \mid S=1) = P(\eta(X)=1 \mid S=0) on real (X,S)(X, S).

FairGAN explicitly removes disparate impact by ensuring X^\hat{X} does not encode SS; in practice, this is measured by the balanced error rate (BER) of a predictor f:X^Sf:\hat{X} \rightarrow S, with BER close to 0.5 indicating minimal leakage.

2. Architecture and Learning Objectives

FairGAN uses a conditional MedGAN-inspired generator and two adversarial discriminators:

  • Generator (GdecG_{\text{dec}}): Accepts noise zPzz \sim P_z and protected attribute ss. It first produces a latent embedding via G(z,s)G(z, s), then decodes to mixed discrete/continuous synthetic features (X^,Y^)(\hat{X}, \hat{Y}) using a pre-trained decoder (from an autoencoder). The output is (X^,Y^,S^)(\hat{X}, \hat{Y}, \hat{S}) with S^=s\hat{S} = s.
  • Discriminator D1D_1 (Utility Critic): Receives (X,Y,S)(X, Y, S) and distinguishes real samples from generated.
  • Discriminator D2D_2 (Fairness Critic): Receives (X^,Y^)(\hat{X}, \hat{Y}) and predicts ss. By minimizing D2D_2's ability to recover ss, FairGAN forces GG to remove undesirable correlations.

The full training objective is a dual-minimax game:

minG  maxD1,D2  V1(G,D1)+λV2(G,D2)\min_G\;\max_{D_1, D_2} \; V_1(G, D_1) + \lambda V_2(G, D_2)

where:

V1(G,D1)=E(x,y,s)Pdata[logD1(x,y,s)]+Ez,s[log(1D1(G(z,s),s))]V_1(G, D_1) = \mathbb{E}_{(x, y, s) \sim P_{\mathrm{data}}} [\log D_1(x, y, s)] + \mathbb{E}_{z, s} [\log(1 - D_1(G(z, s), s))]

V2(G,D2)=Ez[logD2(G(z,s=1))]+Ez[log(1D2(G(z,s=0)))]V_2(G, D_2) = \mathbb{E}_{z} [\log D_2(G(z, s{=}1))] + \mathbb{E}_{z} [\log(1 - D_2(G(z, s{=}0)))]

λ>0\lambda > 0 is a hyperparameter that trades off data fidelity and fairness. V1V_1 is the standard conditional GAN loss; V2V_2 regularizes for independence between (X^,Y^)(\hat{X}, \hat{Y}) and SS (Xu et al., 2018).

3. Training Procedure and Control of Fairness–Utility Trade-off

Training involves two phases:

  1. Pre-train an autoencoder (Enc,Dec)(\mathrm{Enc}, \mathrm{Dec}) on the real (X,Y)(X, Y) for efficient mixed-type reconstruction.
  2. Alternately update D1D_1, GG, D2D_2, and again GG using minibatch stochastic gradient descent with Adam, sequentially applying the V1V_1 and λ\lambda-scaled V2V_2 losses to optimize for data utility and fairness.

Adjustment of λ\lambda interpolates between near-perfect data utility (λ0\lambda \approx 0, equivalent to a standard conditional GAN) and strong fairness (λ0\lambda \gg 0), allowing the practitioner to tune the fairness–utility trade-off according to application needs.

4. Empirical Evaluation and Results

Experiments on the UCI Adult dataset (48,842 instances, 57-dimensional one-hot encoded features, protected attribute: sex, label: income >>50K) compare FairGAN against:

  • SYN1-GAN: standard conditional GAN
  • SYN2-NFGAN-I: GAN on (X,Y)(X, Y) with random reassignment of SS
  • SYN3-NFGAN-II: two-discriminator GAN for PG(X,Ys=0)=PG(X,Ys=1)P_G(X, Y|s=0) = P_G(X, Y|s=1), neglecting data matching
  • SYN4-FairGAN: full objective with λ=1\lambda=1

Key metrics and representative results:

Metric Real SYN1-GAN SYN2-NFGAN-I SYN3-NFGAN-II SYN4-FairGAN
Risk diff Δdata\Delta_{\text{data}} 0.1989 0.1798±0.0026 0.0025±0.0007 0.0062±0.0037 0.0411±0.0295
BER (predict SS from XX) 0.1538 0.3862±0.0036
Data utility (2\ell_2 joint dist) 0.0198±0.0002 0.0208±0.0005
SVM-Lin SYN2REAL Accuracy 84.69% 83.63±1.08% 82.17±0.93%
SVM-Lin SYN2REAL Risk Diff 0.1784 0.1712±0.0062 0.0461±0.0424

FairGAN reduces the risk difference in classifier predictions by approximately 75% at a cost of roughly 2% absolute decrease in accuracy for downstream tasks (Xu et al., 2018).

5. Extensions: Transfer and Reprogramming

A VAE-based reprogramming of FairGAN facilitates adaptation to new tabular datasets and tasks without retraining the entire model. A variational autoencoder’s decoder, pre-trained on the source data, acts as a fixed front-end for a new task-specific encoder and adversarial heads. This modularity enables rapid, resource-light transfer while maintaining the original targets of utility, fairness, and classifier performance. Trade-offs associated with this approach include increased hyperparameter sensitivity and possible convergence challenges in aligning fairness and accuracy on new domains (Nobile et al., 2022).

6. Limitations, Failure Modes, and Theoretical Constraints

FairGAN, in its original formulation, enforces only demographic parity; further group fairness notions (e.g., equalized odds, calibration) are not guaranteed and would require modifications such as additional adversarial heads predicting SYS|Y (Xu et al., 2018). Extension to multiple or non-binary protected attributes is not addressed in the initial proposal. Like other GAN-based approaches, FairGAN is prone to mode collapse and training instability, necessitating careful monitoring. Empirically, values of λ[0.5,2]\lambda \in [0.5, 2] provide effective fairness–utility balances, but extreme weighting can degrade performance on either axis. In reprogrammed FairGANs, attaining perfect independence (50% D2D_2 accuracy) may not be achievable in all transfer scenarios without prohibitive utility loss, and the choice of latent dimensionality mediates a trade-off between representational capacity and privacy/sensitivity leakage (Nobile et al., 2022).

7. Significance and Impact on Fair Synthetic Data Generation

FairGAN is the first GAN-based approach for generating discrimination-free tabular data that effectively separates utility (realism) and fairness objectives via a dual-discriminator setup and a single interpretable trade-off parameter. It provides a pre-processing solution for de-biasing data such that both the generated datasets and the downstream classifiers inherit reduced disparate impact and disparate treatment. This has established FairGAN as a canonical reference point for subsequent fairness-aware generative modeling in structured data, with extensions such as VAE reprogramming further broadening its applicability (Xu et al., 2018, Nobile et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FairGAN.