Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

144 tokens/sec

GPT-4o

8 tokens/sec

Gemini 2.5 Pro Pro

46 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Fair Minimax Optimal Regression

Updated 30 June 2025

Fair minimax optimal regression is a framework that designs regression estimators to minimize the maximum error while ensuring demographic parity across groups.
It leverages optimal transport to adjust outputs from standard regressors, achieving fairness through a principled post-processing step.
A meta-theorem supports the approach by providing universal error bounds that mix conventional regression risk with the complexity of transport map estimation.

Fair minimax optimal regression is the principle and practice of constructing regression estimators that achieve, subject to fairness constraints such as demographic parity, the lowest worst-case (maximal) loss across a family of data generating distributions or regression models. The topic encompasses the theoretical lower bounds (minimax risks) under fairness constraints, sharp upper bounds delivered by practical estimators, the correct post-processing architectures to ensure fairness, and general meta-theorems that validate minimax optimality in a model-agnostic manner.

1. Fair Minimax Optimal Regression: Core Concepts and Framework

In regression with multiple groups (denoted by $M$ ), demographic parity requires a regressor’s output distribution to be identical across groups, that is, for all group pairs $s, s'$ and measurable sets $E$ ,

$\mathbb{P}\big[ f_s(X^{(s)}) \in E \big] = \mathbb{P}\big[ f_{s'}(X^{(s')}) \in E \big] .$

The fair minimax optimal regression problem is then: for a given collection of possible data generating distributions, find the estimator minimizing the maximum mean squared error—measured against the fair Bayes-optimal regressor (i.e., the best achievable under demographic parity)—across all models in the class.

The accuracy of a (possibly random) predictive rule is quantified by

$d^2_{\mu_{X,:}}(f_{:}, \bar{f}^*_{\mu,:}) = \sum_{s=1}^M w_s \int (f_s(x) - \bar{f}^*_{\mu,s}(x))^2 \mu_{X,s}(dx)$

where $\bar{f}^*_{\mu,s}$ is the fair Bayes-optimal regressor and $w_s$ the group weights.

The fair minimax error is thus

$\bar{\mathcal{E}}_n(\mathcal{P}) = \inf_{\bar{f}_{n,:}: \text{fair}} \sup_{\mu_{:}\in\mathcal{P}} \mathbb{E}_{\mu^n_{:}}[ d^2_{\mu_{X,:}}(\bar{f}_{n,:}, \bar{f}^*_{\mu,:}) ] .$

2. Construction via Optimal Transport and Barycenters

For many regression settings, the population-level fair optimal predictor can be constructed via optimal transport maps to a Wasserstein barycenter of the groupwise predictive output distributions. Specifically, if $\mu_{f,s}$ is the law of the Bayes regressor output for group $s$ , then

$\bar{f}^*_{\mu, s}(x) = (\vartheta^*_{\mu_f, s} \circ f^*_{\mu, s})(x)$

where $\vartheta^*_{\mu_f, s}$ is the optimal transport map from $\mu_{f,s}$ to the barycenter (weighted by $w_s$ ) of $\{\mu_{f,1}, \dots, \mu_{f,M}\}$ .

This structure enables a post-processing approach: after fitting any regression model, apply transport maps (identified via empirical optimal transport to the barycenter) to achieve demographic parity.

3. Meta-Theorem: Error Bound for Demographic Parity-Constrained Regression

The paper presents a meta-theorem that provides a universal upper bound for fair minimax error in terms of:

The conventional regression minimax error $\mathcal{E}_k(\mathcal{P}_s)$ per group and
The complexity (metric entropy) of the transport map class used in the post-processing.

Assume a family of transport maps has entropy parameters $(\alpha, \beta)$ and let $\tilde n = \min_s \frac{n_s}{w_s}$ . There exists a universal constant $C$ such that: $\bar{\mathcal{E}}_n(\mathcal{P}) \leq C \left( L^2 \sum_{s=1}^M w_s \mathcal{E}_{n_s}(\mathcal{P}_s) + \left( \frac{\tilde n}{\ln \tilde n} \right)^{-\alpha/(\alpha+\beta)} \right)$ where $L$ is a universal Lipschitz constant for the map class.

Interpretation: The risk in fair regression may match the unconstrained minimax rate or be dominated by the statistical hardness of optimal transport map estimation, depending on the dimensionality and regularity of the output space.

4. Post-Processing Algorithm for Fair Minimax Regression

The optimal fair regression procedure has the following structure:

Fit a conventional regressor $f_{n,s}$ to each group $s$ , using any minimax-optimal regression method (e.g., least squares, deep neural nets, nonparametric estimators).
Estimate the empirical transport maps $\vartheta_{n,s}$ from each output distribution to the empirical barycenter using a regularized potential learning method (e.g., as in Korotin et al. for Wasserstein barycenters).
Compose to get the fair regressor: $\bar{f}_{n,s}(x) = \vartheta_{n,s}(f_{n,s}(x))$ .

This framework is modular: advances in unconstrained regression plug in directly, with error guarantees controlled solely by the better of regression or transport map estimation difficulty.

5. Quantitative Improvements and Applicability

The meta-theorem allows for sub-root- $n$ rates in regular or smooth regimes and is applicable to a vastly broader family of data generation models than previous analyses. It directly implies that minimax optimality under a fairness constraint can be achieved by post-processing any minimax-efficient regressor, provided the transport step is implemented with sufficient statistical precision.

For example, if regressor outputs are univariate, transport maps are monotonic and thus simple to estimate, so the excess error due to fairness is negligible compared to the regression estimation error. For high-dimensional or less regular output spaces, map estimation error may dominate, as precisely quantified by the entropy parameters ( $\alpha, \beta$ ).

6. Assumptions and Limitations

The theoretical guarantees rest on several key requirements:

The regression model class $\mathcal{P}$ must admit known minimax rates.
The output distributions under the Bayes optimal regressors (per group) must satisfy regularity conditions, typically boundedness and a Poincaré-type inequality.
There must be sufficient samples per group (i.e., group sample sizes should not be too imbalanced).
The class of transport maps used must be rich enough but not so complex as to prevent estimation at reasonable rates.

Potential limitations include computational challenges in estimating barycentric transport maps in high dimension and slower convergence when output support is non-smooth or high-dimensional.

7. Implications for Practice and Future Research

This framework shifts the focus of fair regression: practitioners should continue to invest in improving predictive accuracy using any unconstrained regression technique, knowing that a simple, theoretically-validated post-processing step can impart fairness under demographic parity, to the extent allowed by the underlying data and complexity of the transport step.

The meta-theorem’s modularity suggests it can be extended to new regression modalities (e.g., probabilistic deep learning, manifold-valued outputs), wherever the minimax regression rate and transport map complexity can be characterized. Further research on more computationally tractable or practically efficient barycentric transport estimation, especially in high dimensions, remains an active area.

Summary Table: Fair Minimax Error Meta-Theorem

Source of Error	Rate / Contribution	Notes
Regression	$L^2 \sum_{s=1}^M w_s \mathcal{E}_{n_s}(\mathcal{P}_s)$	Minimax rate per group using unconstrained method
Transport Map	$\left( \frac{\tilde n}{\ln \tilde n} \right)^{-\alpha/(\alpha+\beta)}$	Complexity of map estimation, entropy parameters
Overall	Max of above	Tight up to constants; optimal under DP constraint

In conclusion, meta-optimal approaches grounded in post-processing and optimal transport theory provide a robust, statistically justified, and modular pathway for implementing fair minimax optimal regression across a wide spectrum of models and data settings under demographic parity.

PDF Markdown Chat (Upgrade)