Differentially Private Post-Processing for Fair Regression (2405.04034v1)

Published 7 May 2024 in cs.LG, cs.CR, and cs.CY

Abstract: This paper describes a differentially private post-processing algorithm for learning fair regressors satisfying statistical parity, addressing privacy concerns of machine learning models trained on sensitive data, as well as fairness concerns of their potential to propagate historical biases. Our algorithm can be applied to post-process any given regressor to improve fairness by remapping its outputs. It consists of three steps: first, the output distributions are estimated privately via histogram density estimation and the Laplace mechanism, then their Wasserstein barycenter is computed, and the optimal transports to the barycenter are used for post-processing to satisfy fairness. We analyze the sample complexity of our algorithm and provide fairness guarantee, revealing a trade-off between the statistical bias and variance induced from the choice of the number of bins in the histogram, in which using less bins always favors fairness at the expense of error.

Authors (4)

Ruicheng Xian (9 papers)
Qiaobo Li (3 papers)
Gautam Kamath (68 papers)
Han Zhao (159 papers)

Summary

The paper introduces a post-processing algorithm that adjusts regression outputs to attain statistical parity while preserving individual privacy.
It employs density estimation combined with Wasserstein barycenter computation and optimal transport mapping to align group-specific distributions.
Theoretical analyses and experiments reveal trade-offs among fairness, accuracy, and privacy, offering promising directions for future research.

Exploring Differentially Private Algorithms for Fair Regression

Understanding Differentially Private Fair Regression

In the ever-growing field where machine learning intersects with privacy and fairness, novel methods that promise both fairness in predictions and privacy of data are central to ethical AI solutions. This paper proposes a post-processing algorithm designed to tweak the outputs of regression models to align with fairness norms, particularly focusing on statistical parity, while ensuring that the process adheres to differential privacy standards.

The fairness concept of interest, statistical parity, essentially suggests that the decision system should give similar predictions across different groups defined by sensitive attributes (like race or gender), thereby avoiding discrimination against any particular group. Statistical parity is achieved when the distributions of the regressor's outputs across different groups are similar.

Differential privacy, on the other hand, ensures that the inclusion or exclusion of any single data point in the dataset does not significantly change the output, thereby masking the presence or absence of any individual in the dataset. This is crucial when dealing with sensitive data, providing a mathematical guarantee of privacy.

Core Mechanism and Steps

The proposed algorithm takes a three-fold approach to achieve fair and private predictions:

Density Estimation: First, it privately estimates the output distribution of the initially trained regressor for each group using histogram density estimation combined with the Laplace mechanism. This ensures that the information about individual data points is obscured, adhering to differential privacy.
Wasserstein Barycenter: The second step involves calculating the Wasserstein barycenter of these distributions. A barycenter is basically a central point that minimizes the distance to all group distributions, which in this case, means finding a distribution that all group-specific distributions can align closely with.
Optimal Transport Mapping: Finally, optimal transport theory is used to map the original outputs of the regressor to this central barycenter distribution. This step is crucial as it adjusts the regressor’s output to reduce disparity and ensure fairness.

Trade-offs and Implications

This approach inherently involves balancing between fairness, accuracy, and privacy. Opting to use fewer bins in histogram density estimation favors fairness by simplifying the distribution but may increase the error rate. Increasing the number of bins provides a more accurate estimation but could introduce more noise and reduce fairness.

The dual goals of privacy preservation and fairness attainment do not always align perfectly. Enforcing differential privacy usually introduces noise into the data, which can obscure patterns and result in less fair outcomes. The paper illustrates this with theoretical analysis and numerical experiments which show how changes in parameters (like the number of bins) impact the outcomes.

Theoretical Contributions and Numerical Results

The authors provide a rigorous theoretical analysis demonstrating the algorithm's fairness and privacy guarantees. They also inspect the sample complexity, showing how the algorithm behaves with different sample sizes.

Experimentally, the results are promising. Applying the algorithm to datasets such as Law School data and Communities and Crime data, the paper showcases how one can maintain a reasonable balance between error rates and fairness under various privacy settings.

Future Directions

Looking ahead, an interesting extension would be to adapt this algorithm for cases where sensitive attributes are unobservable at the prediction time. This means predicting these attributes based on other available information, which could introduce additional errors and potentially affect both privacy and fairness.

Additionally, the real-world applicability of this algorithm needs thorough evaluation, especially in high-stakes domains like healthcare and criminal justice where fairness and privacy are paramount.

In conclusion, this paper makes significant strides in addressing the crucial intersection of machine learning fairness and privacy. It introduces a theoretically sound and experimentally validated approach that can guide future work in developing privacy-aware, fair machine learning models.

PDF Markdown

Related Papers

GitHub

GitHub - rxian/fair-regression: Private post-processing for fair regression (2 stars)

Tweets

https://twitter.com/hanzhao_ml/status/1789696895398404504

https://twitter.com/thegautamkamath/status/1790094907408887914

https://twitter.com/RichardKCollin2/status/1789977150340153653

https://twitter.com/WGOV/status/1788096037518749775