Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fair Minimax Optimal Regression

Updated 30 June 2025
  • Fair minimax optimal regression is a framework that designs regression estimators to minimize the maximum error while ensuring demographic parity across groups.
  • It leverages optimal transport to adjust outputs from standard regressors, achieving fairness through a principled post-processing step.
  • A meta-theorem supports the approach by providing universal error bounds that mix conventional regression risk with the complexity of transport map estimation.

Fair minimax optimal regression is the principle and practice of constructing regression estimators that achieve, subject to fairness constraints such as demographic parity, the lowest worst-case (maximal) loss across a family of data generating distributions or regression models. The topic encompasses the theoretical lower bounds (minimax risks) under fairness constraints, sharp upper bounds delivered by practical estimators, the correct post-processing architectures to ensure fairness, and general meta-theorems that validate minimax optimality in a model-agnostic manner.

1. Fair Minimax Optimal Regression: Core Concepts and Framework

In regression with multiple groups (denoted by MM), demographic parity requires a regressor’s output distribution to be identical across groups, that is, for all group pairs s,ss, s' and measurable sets EE,

P[fs(X(s))E]=P[fs(X(s))E].\mathbb{P}\big[ f_s(X^{(s)}) \in E \big] = \mathbb{P}\big[ f_{s'}(X^{(s')}) \in E \big] .

The fair minimax optimal regression problem is then: for a given collection of possible data generating distributions, find the estimator minimizing the maximum mean squared error—measured against the fair Bayes-optimal regressor (i.e., the best achievable under demographic parity)—across all models in the class.

The accuracy of a (possibly random) predictive rule is quantified by

dμX,:2(f:,fˉμ,:)=s=1Mws(fs(x)fˉμ,s(x))2μX,s(dx)d^2_{\mu_{X,:}}(f_{:}, \bar{f}^*_{\mu,:}) = \sum_{s=1}^M w_s \int (f_s(x) - \bar{f}^*_{\mu,s}(x))^2 \mu_{X,s}(dx)

where fˉμ,s\bar{f}^*_{\mu,s} is the fair Bayes-optimal regressor and wsw_s the group weights.

The fair minimax error is thus

Eˉn(P)=inffˉn,::fairsupμ:PEμ:n[dμX,:2(fˉn,:,fˉμ,:)].\bar{\mathcal{E}}_n(\mathcal{P}) = \inf_{\bar{f}_{n,:}: \text{fair}} \sup_{\mu_{:}\in\mathcal{P}} \mathbb{E}_{\mu^n_{:}}[ d^2_{\mu_{X,:}}(\bar{f}_{n,:}, \bar{f}^*_{\mu,:}) ] .

2. Construction via Optimal Transport and Barycenters

For many regression settings, the population-level fair optimal predictor can be constructed via optimal transport maps to a Wasserstein barycenter of the groupwise predictive output distributions. Specifically, if μf,s\mu_{f,s} is the law of the Bayes regressor output for group ss, then

fˉμ,s(x)=(ϑμf,sfμ,s)(x)\bar{f}^*_{\mu, s}(x) = (\vartheta^*_{\mu_f, s} \circ f^*_{\mu, s})(x)

where ϑμf,s\vartheta^*_{\mu_f, s} is the optimal transport map from μf,s\mu_{f,s} to the barycenter (weighted by wsw_s) of {μf,1,,μf,M}\{\mu_{f,1}, \dots, \mu_{f,M}\}.

This structure enables a post-processing approach: after fitting any regression model, apply transport maps (identified via empirical optimal transport to the barycenter) to achieve demographic parity.

3. Meta-Theorem: Error Bound for Demographic Parity-Constrained Regression

The paper presents a meta-theorem that provides a universal upper bound for fair minimax error in terms of:

  • The conventional regression minimax error Ek(Ps)\mathcal{E}_k(\mathcal{P}_s) per group and
  • The complexity (metric entropy) of the transport map class used in the post-processing.

Assume a family of transport maps has entropy parameters (α,β)(\alpha, \beta) and let n~=minsnsws\tilde n = \min_s \frac{n_s}{w_s}. There exists a universal constant CC such that: Eˉn(P)C(L2s=1MwsEns(Ps)+(n~lnn~)α/(α+β))\bar{\mathcal{E}}_n(\mathcal{P}) \leq C \left( L^2 \sum_{s=1}^M w_s \mathcal{E}_{n_s}(\mathcal{P}_s) + \left( \frac{\tilde n}{\ln \tilde n} \right)^{-\alpha/(\alpha+\beta)} \right) where LL is a universal Lipschitz constant for the map class.

Interpretation: The risk in fair regression may match the unconstrained minimax rate or be dominated by the statistical hardness of optimal transport map estimation, depending on the dimensionality and regularity of the output space.

4. Post-Processing Algorithm for Fair Minimax Regression

The optimal fair regression procedure has the following structure:

  1. Fit a conventional regressor fn,sf_{n,s} to each group ss, using any minimax-optimal regression method (e.g., least squares, deep neural nets, nonparametric estimators).
  2. Estimate the empirical transport maps ϑn,s\vartheta_{n,s} from each output distribution to the empirical barycenter using a regularized potential learning method (e.g., as in Korotin et al. for Wasserstein barycenters).
  3. Compose to get the fair regressor: fˉn,s(x)=ϑn,s(fn,s(x))\bar{f}_{n,s}(x) = \vartheta_{n,s}(f_{n,s}(x)).

This framework is modular: advances in unconstrained regression plug in directly, with error guarantees controlled solely by the better of regression or transport map estimation difficulty.

5. Quantitative Improvements and Applicability

The meta-theorem allows for sub-root-nn rates in regular or smooth regimes and is applicable to a vastly broader family of data generation models than previous analyses. It directly implies that minimax optimality under a fairness constraint can be achieved by post-processing any minimax-efficient regressor, provided the transport step is implemented with sufficient statistical precision.

For example, if regressor outputs are univariate, transport maps are monotonic and thus simple to estimate, so the excess error due to fairness is negligible compared to the regression estimation error. For high-dimensional or less regular output spaces, map estimation error may dominate, as precisely quantified by the entropy parameters (α,β\alpha, \beta).

6. Assumptions and Limitations

The theoretical guarantees rest on several key requirements:

  • The regression model class P\mathcal{P} must admit known minimax rates.
  • The output distributions under the Bayes optimal regressors (per group) must satisfy regularity conditions, typically boundedness and a Poincaré-type inequality.
  • There must be sufficient samples per group (i.e., group sample sizes should not be too imbalanced).
  • The class of transport maps used must be rich enough but not so complex as to prevent estimation at reasonable rates.

Potential limitations include computational challenges in estimating barycentric transport maps in high dimension and slower convergence when output support is non-smooth or high-dimensional.

7. Implications for Practice and Future Research

This framework shifts the focus of fair regression: practitioners should continue to invest in improving predictive accuracy using any unconstrained regression technique, knowing that a simple, theoretically-validated post-processing step can impart fairness under demographic parity, to the extent allowed by the underlying data and complexity of the transport step.

The meta-theorem’s modularity suggests it can be extended to new regression modalities (e.g., probabilistic deep learning, manifold-valued outputs), wherever the minimax regression rate and transport map complexity can be characterized. Further research on more computationally tractable or practically efficient barycentric transport estimation, especially in high dimensions, remains an active area.


Summary Table: Fair Minimax Error Meta-Theorem

Source of Error Rate / Contribution Notes
Regression L2s=1MwsEns(Ps)L^2 \sum_{s=1}^M w_s \mathcal{E}_{n_s}(\mathcal{P}_s) Minimax rate per group using unconstrained method
Transport Map (n~lnn~)α/(α+β)\left( \frac{\tilde n}{\ln \tilde n} \right)^{-\alpha/(\alpha+\beta)} Complexity of map estimation, entropy parameters
Overall Max of above Tight up to constants; optimal under DP constraint

In conclusion, meta-optimal approaches grounded in post-processing and optimal transport theory provide a robust, statistically justified, and modular pathway for implementing fair minimax optimal regression across a wide spectrum of models and data settings under demographic parity.