Censoring Representations with an Adversary (1511.05897v3)

Published 18 Nov 2015 in cs.LG, cs.AI, and stat.ML

Abstract: In practice, there are often explicit constraints on what representations or decisions are acceptable in an application of machine learning. For example it may be a legal requirement that a decision must not favour a particular group. Alternatively it can be that that representation of data must not have identifying information. We address these two related issues by learning flexible representations that minimize the capability of an adversarial critic. This adversary is trying to predict the relevant sensitive variable from the representation, and so minimizing the performance of the adversary ensures there is little or no information in the representation about the sensitive variable. We demonstrate this adversarial approach on two problems: making decisions free from discrimination and removing private information from images. We formulate the adversarial model as a minimax problem, and optimize that minimax objective using a stochastic gradient alternate min-max optimizer. We demonstrate the ability to provide discriminant free representations for standard test problems, and compare with previous state of the art methods for fairness, showing statistically significant improvement across most cases. The flexibility of this method is shown via a novel problem: removing annotations from images, from unaligned training examples of annotated and unannotated images, and with no a priori knowledge of the form of annotation provided to the model.

PDF Abstract

Analyzing "Censoring Representations with an Adversary"

The paper "Censoring Representations with an Adversary" by Harrison Edwards and Amos Storkey presents an innovative approach to addressing sensitive information issues in machine learning by utilizing adversarial techniques to learn fair and private representations. The authors propose solutions to two related challenges: fair decision-making free from discrimination and image anonymization by removing sensitive information from data representations.

Adversarial Learned Fair Representations (ALFR)

The paper introduces Adversarial Learned Fair Representations (ALFR), a method designed to ensure that machine learning models make fair predictions independent of sensitive attributes. ALFR frames this problem as a minimax optimization task where the objective is to learn data representations that obscure sensitive variables while maintaining prediction accuracy.

The adversarial approach involves training a critic network to predict sensitive attributes from the learned representations. By minimizing the critic's performance, the model effectively obscures sensitive information. This is formalized as a dual-objective minimax problem, where the goal is to find representations that strike a balance between being discriminative for the target task and obfuscating the sensitive variables.

Image Anonymization

Building on their framework, the authors explore a novel application in image anonymization, demonstrating the flexibility of the adversarial approach. They propose a method to remove annotations from images using a modified autoencoder that separates out sensitive information. The innovation lies in training without aligned input-output pairs, utilizing separate collections of annotated and unannotated images.

Comparative Analysis and Results

The authors perform extensive experiments on datasets including Adult and Diabetes from the UCI repository, demonstrating significant improvements over previously established methods such as Learned Fair Representations (LFR). Quantitative metrics, such as classification accuracy and discrimination measures, show that ALFR achieves statistically significant improvements across most test settings.

Formalism and Optimization

The paper details a comprehensive formalism wherein the fairness criterion requires statistical parity between different groups defined by sensitive attributes. The optimization involves an adversarial set up where the encoder, decoder, predictor, and encoder networks are optimized jointly using a stochastic gradient method. This setting allows for robust learning even in semi-supervised conditions, an advantageous feature not commonly available in traditional methods.

Implications and Future Directions

Practically, the ability to censor representations without detailed, explicit alignment vastly improves the potential applications in privacy-critical scenarios. Theoretically, this adversarial method enriches the understanding of the interplay between data fairness and representation learning.

Future work suggested by the authors includes enhancing the stability of adversarial training and tackling complex scenarios such as removing pervasive information like gender from images. The potential of adapting these techniques across varying domains illustrates the versatility of the proposed method.

Overall, this research addresses significant challenges in privacy and fairness by innovatively applying adversarial frameworks, making substantial contributions to the field of ethical AI.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Harrison Edwards (9 papers)
Amos Storkey (75 papers)

Citations (489)

View on Semantic Scholar