Tuning the Distortion Parameter in Optimized Pre-processing

Determine a principled method for selecting the distortion-control parameter c_{z,x,y} in the optimization-based pre-processing framework that transforms a dataset with features X, labels Y, and protected attributes Z into a repaired dataset (as in Calmon et al., 2017), so that practitioners have concrete guidance on parameter choice rather than ad hoc selection.

Background

The paper reviews optimized pre-processing methods for algorithmic fairness that learn a randomized mapping p(tilde X, tilde Y | X, Y, Z) to transform the original data, seeking to mitigate disparate impact while constraining individual-level distortion. In this formulation, the parameter c_{z,x,y} explicitly controls the allowable distortion for each individual record, mediating the trade-off between fairness constraints and fidelity to original data.

The authors note that there is no guidance on how to tune this distortion-control parameter, raising the risk that implementers may choose it arbitrarily. Establishing a principled procedure for selecting c_{z,x,y} is therefore an unresolved issue crucial to both technical performance and legal acceptability.

References

It is also unclear how to tune $c_{z, x, y}$. In practice, it seems like an implementor of this method could arbitrarily set the value without any legal oversight.

On the Fairness of 'Fake' Data in Legal AI (2009.04640 - Boswell et al., 2020) in Subsection 3.2 (Optimised Pre-processing)