Predictive Inequity in Object Detection (1902.11097v1)

Published 21 Feb 2019 in cs.CV, cs.LG, and stat.ML

Abstract: In this work, we investigate whether state-of-the-art object detection systems have equitable predictive performance on pedestrians with different skin tones. This work is motivated by many recent examples of ML and vision systems displaying higher error rates for certain demographic groups than others. We annotate an existing large scale dataset which contains pedestrians, BDD100K, with Fitzpatrick skin tones in ranges [1-3] or [4-6]. We then provide an in-depth comparative analysis of performance between these two skin tone groupings, finding that neither time of day nor occlusion explain this behavior, suggesting this disparity is not merely the result of pedestrians in the 4-6 range appearing in more difficult scenes for detection. We investigate to what extent time of day, occlusion, and reweighting the supervised loss during training affect this predictive bias.

PDF Abstract

An Analytical Perspective on Predictive Inequity in Pedestrian Detection

The paper "Predictive Inequity in Object Detection" presents a rigorous analysis of bias in machine learning systems, particularly focusing on object detection models used in autonomous vehicles. The research investigates the performance variance of these systems when tasked with detecting pedestrians of different skin tones, specifically comparing the Fitzpatrick scale groups 1-3 (LS) against 4-6 (DS).

Core Insights and Methodology

The paper uses the BDD100K dataset to conduct a comprehensive assessment. Human annotators categorized pedestrians by skin tone, resulting in a significant spike in methodological reliability when focusing on larger pedestrian bounding boxes, i.e., those with an area greater than 10,000 pixels. Notably, more representatives of LS were labeled compared to DS, indicating a potential skew in data representation.

The authors evaluated state-of-the-art models like Faster R-CNN and Mask R-CNN across different training data sources, such as the MS COCO and BDD100K datasets. These models showed consistently higher predictive performance for LS pedestrians than for DS pedestrians, as evident from the average precision scores. This discrepancy highlighted a systemic bias that transcends individual model design or training datasets.

Exploring Sources of Inequity

Different potential causes of this bias were assessed:

Occlusion: The paper concluded that occlusion does not explain the disparity in predictive performance, as the performance gap persisted even when occluded pedestrians were excluded from the analysis.
Time of Day: Results showed variances during day and night tests. Surprisingly, at night, DS individuals were sometimes detected with higher accuracy than LS, though daytime results mirrored the original inequity. This inconsistency suggests that lighting conditions, although a consideration, are not a definitive explanatory factor.
Training Data Bias: The research confirmed that data imbalance (i.e., more training data for LS individuals) contributed to predictive inequity. Models trained primarily on LS data exhibited a learning bias, which could be partially mediated by reweighting loss functions to give higher significance to DS examples.

Practical and Theoretical Implications

The findings impart crucial implications for fair deployment and application of ML systems in real-world scenarios, especially in sensitive domains such as autonomous driving. The evidence of systematic bias pressures for an overhaul in dataset methodologies and model training frameworks. Failure to address these disparities not only risks perpetuating social biases but also poses ethical and legal camouflages, stressing liabilities and undermining trust in autonomous technologies.

Prospective Directions

For future research, there’s a savant necessity to innovate in:

Data Curation: Collecting balanced datasets with equitable representations of demographic variables is indispensable. This endeavor might include developing novel, large-scale datasets to validate findings with improved confidence metrics.
Model Training: Exploring advanced model architectures or training techniques that inherently incorporate fairness constraints could induce more equitable predictive behaviors.
Comprehensive Evaluation Techniques: Developing more nuanced evaluation metrics beyond average precision that capture the multifaceted nature of biased model outputs can enhance diagnostic capabilities.

The work by Wilson et al. is an exemplar of the intricate challenges underlying predictive fairness in machine learning systems. It ignites a call not just for technical solutions, but also for a broader interdisciplinary dialogue involving ethicists, policymakers, and computer scientists. The ultimate vision postulated is equitable AI systems that integrate complex societal and ethical considerations effectively into their core design philosophy.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Benjamin Wilson (11 papers)
Judy Hoffman (75 papers)
Jamie Morgenstern (50 papers)

Citations (206)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos