FairVis: Visual Analytics for Discovering Intersectional Bias in Machine Learning (1904.05419v4)

Published 10 Apr 2019 in cs.LG and stat.ML

Abstract: The growing capability and accessibility of machine learning has led to its application to many real-world domains and data about people. Despite the benefits algorithmic systems may bring, models can reflect, inject, or exacerbate implicit and explicit societal biases into their outputs, disadvantaging certain demographic subgroups. Discovering which biases a machine learning model has introduced is a great challenge, due to the numerous definitions of fairness and the large number of potentially impacted subgroups. We present FairVis, a mixed-initiative visual analytics system that integrates a novel subgroup discovery technique for users to audit the fairness of machine learning models. Through FairVis, users can apply domain knowledge to generate and investigate known subgroups, and explore suggested and similar subgroups. FairVis' coordinated views enable users to explore a high-level overview of subgroup performance and subsequently drill down into detailed investigation of specific subgroups. We show how FairVis helps to discover biases in two real datasets used in predicting income and recidivism. As a visual analytics system devoted to discovering bias in machine learning, FairVis demonstrates how interactive visualization may help data scientists and the general public understand and create more equitable algorithmic systems.

PDF Abstract

FairVis: Visual Analytics for Discovering Intersectional Bias in Machine Learning

The paper "FairVis: Visual Analytics for Discovering Intersectional Bias in Machine Learning" introduces a novel system designed to address biases in ML models, particularly those biases that occur at intersections of demographic features. FairVis is a mixed-initiative visual analytics tool that allows users to audit ML models for fairness, leveraging both algorithmic suggestions and user domain knowledge to identify potentially underperforming subgroups.

Overview of FairVis

The authors identify the growing concern around biases introduced by ML models, especially in domains involving human data such as employment, criminal justice, and healthcare. While ML models are optimized for overall accuracy, disparities in performance across different demographic groups can manifest as algorithmic bias. Recognizing the inherent difficulty in detecting intersectional biases—biases present in groups defined by combinations of features—the paper proposes FairVis as a solution.

System Components

FairVis offers several key components organized into coordinated views:

Feature Distribution View: Provides a high-level overview of the dataset's feature distributions, allowing users to generate specific subgroups based on selected demographic features. This view helps users apply their domain knowledge to explore known subgroups.
Subgroup Overview: Displays performance metrics across multiple subgroups in a series of dynamic strip plots. Users can customize the plots to show the metrics most relevant to their analysis, enabling them to contextualize subgroup performance relative to the entire dataset and other subgroups.
Suggested and Similar Subgroup View: Automatically suggests potentially biased subgroups using a novel subgroup generation technique based on clustering and entropy calculations. Users can filter and sort these suggested groups by desired metrics to prioritize investigation.
Detailed Comparison View: Allows users to compare the performance and feature distributions of selected subgroups. This detailed analysis supports hypothesis formation regarding the causes of performance disparities.

Methodology

FairVis integrates a subgroup discovery process wherein data instances are clustered based on feature similarities. Clusters are then analyzed for dominant features using entropy calculations, providing users with subgroup suggestions ranked by potential fairness issues. Additionally, FairVis employs statistical divergence measures (Jensen-Shannon divergence) to find subgroups with similar distributions for further analysis.

Implications and Future Directions

FairVis has significant implications for improving fairness audits in ML systems. By enabling researchers and practitioners to uncover intersectional biases, FairVis can inform adjustments to model training practices or datasets to achieve more equitable performance. The system's ability to suggest potential biased subgroups and facilitate complex comparative analyses provides a comprehensive framework for fairness-centric model evaluation.

The paper outlines potential future work, including expanding FairVis to support multiclass classification and other data modalities, such as textual or graphical data. These extensions would enhance the system's applicability across a broader range of ML problems. The authors also suggest improvements in scalability and suggest avenues toward automated resolutions for biases detected, such as post-processing modification techniques.

In sum, FairVis represents a critical step towards addressing the pressing issue of fairness in machine learning. By providing tools for nuanced bias detection and analysis, this system potentially empowers data scientists to make informed decisions about their models' biases, leading to more just and fair algorithmic systems.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Ángel Alexander Cabrera (11 papers)
Will Epperson (9 papers)
Fred Hohman (31 papers)
Minsuk Kahng (29 papers)
Jamie Morgenstern (50 papers)
Duen Horng Chau (109 papers)

Citations (164)

View on Semantic Scholar

FairVis: Visual Analytics for Discovering Intersectional Bias in Machine Learning (1904.05419v4)