A causal framework for discovering and removing direct and indirect discrimination (1611.07509v1)

Published 22 Nov 2016 in cs.LG

Abstract: Anti-discrimination is an increasingly important task in data science. In this paper, we investigate the problem of discovering both direct and indirect discrimination from the historical data, and removing the discriminatory effects before the data is used for predictive analysis (e.g., building classifiers). We make use of the causal network to capture the causal structure of the data. Then we model direct and indirect discrimination as the path-specific effects, which explicitly distinguish the two types of discrimination as the causal effects transmitted along different paths in the network. Based on that, we propose an effective algorithm for discovering direct and indirect discrimination, as well as an algorithm for precisely removing both types of discrimination while retaining good data utility. Different from previous works, our approaches can ensure that the predictive models built from the modified data will not incur discrimination in decision making. Experiments using real datasets show the effectiveness of our approaches.

PDF Abstract

A Causal Framework for Discovering and Removing Direct and Indirect Discrimination

In the domain of data science and predictive modeling, ensuring non-discriminatory practices in decision-making is a critical challenge. The paper "A Causal Framework for Discovering and Removing Direct and Indirect Discrimination" addresses this challenge by proposing a causal framework to identify and mitigate discrimination present in historical datasets. This work emphasizes direct and indirect discrimination, employing causal network models—particularly path-specific effects—to measure and eliminate discriminatory effects before datasets are utilized for building predictive models.

The core innovation in the paper is the utilization of causal modeling to discern the pathways through which discrimination occurs. Direct discrimination is modeled through causal effects transmitted via direct paths from protected attributes (such as gender or race) to decision outcomes within a causal network. In contrast, indirect discrimination is represented via effects carried through indirect paths that involve unjustified attributes known as redlining attributes, which might implicitly correlate with the protected attributes (e.g., using zip code to indirectly reflect racial profiles).

The authors propose algorithms for detecting and removing these discriminatory effects. The Path-Specific based Discrimination Discovery (PSE-DD) algorithm efficiently identifies discrimination by calculating path-specific effects using observational data and checking these against defined thresholds for discrimination. This method's computational efficiency is notable, given the complexity inherent in causal network construction and probability computations.

For discrimination removal, the Path-Specific Effect based Discrimination Removal (PSE-DR) algorithm generates modified datasets by adjusting the conditional probability tables in the causal network. The quadratic programming approach ensures that the modifications retain high data utility while removing both direct and indirect discrimination paths. Crucially, this paper demonstrates that predictive models trained on modified datasets do not incur discrimination, differentiating this solution from existing methodologies, which may fail to ensure unbiased decision-making once models are applied.

The practical implications of this research are twofold. First, it offers robust algorithms that can be integrated into data processing pipelines to ensure compliance with anti-discrimination legislation in automated decision-making systems. Second, theoretically, it advances the discourse on fairness in AI by suggesting that systemic bias can be quantitatively assessed and adjusted within data-driven frameworks. The work sets the stage for further exploration into refining causal modeling for fairness applications in various domains.

Looking forward, future developments in AI could benefit from refining causal inference methods to handle complex, high-dimensional datasets more efficiently. Expanding the framework to accommodate multiple protected attributes or exploring alternative causal discovery methods might enhance this approach's applicability and accuracy. Such advancements could lead to more nuanced models for detecting intricate forms of discrimination beyond the binary frameworks addressed in this paper.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Lu Zhang (373 papers)
Yongkai Wu (22 papers)
Xintao Wu (70 papers)

Citations (163)

View on Semantic Scholar

A causal framework for discovering and removing direct and indirect discrimination (1611.07509v1)

A Causal Framework for Discovering and Removing Direct and Indirect Discrimination

Related Papers