Face Recognition: Too Bias, or Not Too Bias? (2002.06483v4)

Published 16 Feb 2020 in cs.CV

Abstract: We reveal critical insights into problems of bias in state-of-the-art facial recognition (FR) systems using a novel Balanced Faces In the Wild (BFW) dataset: data balanced for gender and ethnic groups. We show variations in the optimal scoring threshold for face-pairs across different subgroups. Thus, the conventional approach of learning a global threshold for all pairs resulting in performance gaps among subgroups. By learning subgroup-specific thresholds, we not only mitigate problems in performance gaps but also show a notable boost in the overall performance. Furthermore, we do a human evaluation to measure the bias in humans, which supports the hypothesis that such a bias exists in human perception. For the BFW database, source code, and more, visit github.com/visionjo/facerec-bias-bfw.

Authors (6)

Gennady Livitz (1 paper)
Yann Henon (2 papers)
Can Qin (37 papers)
Yun Fu (131 papers)
Samson Timoner (3 papers)
Joseph P Robinson (5 papers)

Citations (117)

View on Semantic Scholar

Summary

Analyzing Bias in Facial Recognition Systems: Insights from the BFW Dataset

This paper presents a critical examination of bias within facial recognition (FR) systems and introduces the Balanced Faces In the Wild (BFW) dataset, designed to facilitate unbiased evaluations of FR algorithms. The authors address the inherent biases present in state-of-the-art FR systems, which often result from imbalanced training data distributions, particularly with respect to gender and ethnicity.

Dataset Construction and Problem Formulation

The BFW dataset is meticulously curated to include balanced samples across eight demographic subgroups, defined by combinations of four ethnicities (Asian, Black, Indian, White) and two genders (Male, Female). Each demographic subgroup comprises an equal number of identities and samples, with the express purpose of providing a standard against which biased performance in FR systems can be measured.

The authors critique the conventional method of applying a single global threshold to assess the similarity between face pairs, a practice that results in skewed performance across different demographic groups. This discrepancy is evident in the varying distributions of similarity scores among different groups, as detailed in both quantitative analyses and visualization of score distributions.

Methodologies for Bias Mitigation

The research proposes an adaptive threshold approach, wherein subgroup-specific thresholds are applied to mitigate the observed performance disparities. This method seeks to equalize the True Positive Rate (TPR) across subgroups, maintaining an intended False Positive Rate (FPR) consistently. Empirical results demonstrate that this approach not only enhances overall accuracy but also ensures more equitable performance across demographic lines.

Impact and Implications

The implications of this research are multifaceted. Practically, the introduction of subgroup-specific thresholds in FR systems could lead to more reliable and fair applications in scenarios where algorithmic decisions impact personal and societal safety. Theoretically, the paper challenges the prevailing assumption that facial recognition technologies can be universally applicable without consideration of demographic variables.

The paper also includes a survey of human perception biases in facial recognition, paralleling algorithmic biases. Results indicate that humans, much like machines, perform better at recognizing individuals from their own demographic groups. This finding underscores the complexity of developing truly unbiased FR systems and highlights the need for continued research into both human and algorithmic biases.

Future Directions

Looking ahead, the development and deployment of fair FR systems necessitate ongoing refinement of training datasets and evaluation benchmarks. The BFW dataset provides a foundational resource for such work. Future research could extend the analysis to other demographic factors, such as age and cultural backgrounds, to further enhance the understanding and mitigation of bias in machine learning systems.

The significance of adapting FR systems to account for demographic variability cannot be understated, as these systems increasingly intersect with legal, social, and ethical domains. As researchers continue to explore this evolving field, datasets like BFW will remain pivotal in driving improvements toward more just and equitable AI technologies.

PDF Markdown