On the Fairness of Disentangled Representations (1905.13662v2)

Published 31 May 2019 in cs.LG and stat.ML

Abstract: Recently there has been a significant interest in learning disentangled representations, as they promise increased interpretability, generalization to unseen scenarios and faster learning on downstream tasks. In this paper, we investigate the usefulness of different notions of disentanglement for improving the fairness of downstream prediction tasks based on representations. We consider the setting where the goal is to predict a target variable based on the learned representation of high-dimensional observations (such as images) that depend on both the target variable and an \emph{unobserved} sensitive variable. We show that in this setting both the optimal and empirical predictions can be unfair, even if the target variable and the sensitive variable are independent. Analyzing the representations of more than \num{12600} trained state-of-the-art disentangled models, we observe that several disentanglement scores are consistently correlated with increased fairness, suggesting that disentanglement may be a useful property to encourage fairness when sensitive variables are not observed.

Authors (6)

Francesco Locatello (92 papers)
Gabriele Abbati (6 papers)
Tom Rainforth (62 papers)
Stefan Bauer (102 papers)
Bernhard Schölkopf (412 papers)
Olivier Bachem (52 papers)

Citations (220)

View on Semantic Scholar

Summary

A Critical Examination of Fairness in Disentangled Representations

The paper "On the Fairness of Disentangled Representations" addresses a pressing concern in the field of machine learning: the fairness of machine learning models, particularly focusing on the role of disentangled representations. Disentangled representation learning has gained significant traction due to its purported advantages in interpretability, generalization, and accelerated downstream learning. The authors of this paper scrutinize the often hypothesized potential of disentangled representations to enhance fairness in prediction tasks, especially when sensitive variables are not observed.

Approach and Findings

The paper begins with an investigation into whether different notions of disentanglement can indeed bolster fairness in downstream tasks. The framework involves a scenario where predictions are made based on representations derived from high-dimensional observations, which are influenced by both target and unobserved sensitive variables. A critical revelation is that even when target and sensitive variables are independent, predictions can still exhibit unfairness. This stems from the unknown mixing mechanism that introduces a conditional dependency between the target and sensitive variables in the representations or raw data.

Key contributions of this paper include:

Evidence of Unfairness in Entangled Settings: Both theoretical analysis and empirical evaluation, involving over 12,600 trained models, underline that even optimal classifiers can fail to achieve fairness when dealing with entangled representations. This is demonstrated through the measure of demographic parity in various scenarios.
Correlation Between Disentanglement and Fairness: The authors assess the demographic parity of a wide array of prediction models, revealing that disentanglement scores are consistently correlated with increased fairness. This suggests that disentangled representations may inherently facilitate the development of fairer models, particularly when evaluating fairness through the DCI Disentanglement score.
Discrepancies Across Datasets: The research further investigates how different datasets exhibit varying degrees of unfairness. The high variability in unfairness scores across datasets highlights that not all disentangled representations contribute equally to fairness, posing important challenges in representation learning.

Analytical Discourse

The paper embarks on a meticulous evaluation of disentanglement metrics, conjecturing that disentangled representations might allow for the separation of information related to sensitive attributes, thus maintaining fairness. This hypothesis is empirically tested across multiple datasets and disentanglement scores, with a robust correlation being reported predominantly with the DCI Disentanglement metric.

The researchers delve deeper by adjusting the notion of fairness for downstream prediction performance to negate accuracy as a confounding variable. Despite this adjustment, a positive correlation persists, albeit weaker, indicating that disentanglement might have an intrinsic contribution to fairness beyond accuracy.

Implications and Future Directions

From a practical perspective, the findings suggest that careful attention should be paid to the choice of representations in the context of fairness-sensitive applications. Disentangled representations, while beneficial in numerous contexts, need further scrutiny to identify how they can be optimized or selected to inherently support fair predictions.

Theoretical implications of this work point to a need for better understanding the causal relationship between disentanglement and fairness. Future research could explore more granular disentanglement frameworks capable of accommodating dependencies between factors of variation, potentially through integrative approaches that leverage domain knowledge in the form of constraints or auxiliary tasks.

Overall, this paper marks a structured effort to link concepts from interpretability-driven machine learning with fairness concerns, setting a foundational pathway for future work dealing with the ethical and societal implications of machine learning models. The result is a nuanced view that while disentangled representations can be a step towards fairness, their deployment needs careful calibration and contextual awareness.

PDF Markdown

Related Papers

Find Related Papers