A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous Algorithmic Scores (2002.08035v2)

Published 19 Feb 2020 in cs.CY

Abstract: The increased use of algorithmic predictions in sensitive domains has been accompanied by both enthusiasm and concern. To understand the opportunities and risks of these technologies, it is key to study how experts alter their decisions when using such tools. In this paper, we study the adoption of an algorithmic tool used to assist child maltreatment hotline screening decisions. We focus on the question: Are humans capable of identifying cases in which the machine is wrong, and of overriding those recommendations? We first show that humans do alter their behavior when the tool is deployed. Then, we show that humans are less likely to adhere to the machine's recommendation when the score displayed is an incorrect estimate of risk, even when overriding the recommendation requires supervisory approval. These results highlight the risks of full automation and the importance of designing decision pipelines that provide humans with autonomy.

Authors (3)

Maria De-Arteaga (36 papers)
Riccardo Fogliato (18 papers)
Alexandra Chouldechova (46 papers)

Citations (162)

View on Semantic Scholar

Summary

Analyzing Human Interaction with Erroneous Algorithmic Scores in Child Welfare Screening

The paper "A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous Algorithmic Scores" examines the interaction between human decision-makers and algorithmic systems in the child welfare domain. The paper focuses specifically on decision-making processes when algorithmic predictions are incorrect due to technical glitches. It investigates whether human experts, who use these algorithmic tools, can discern incorrect predictions and make autonomous decisions that override misleading recommendations.

The critical context of this research is the use of the Allegheny Family Screening Tool (AFST), a risk assessment tool deployed in Allegheny County to aid in screening decisions regarding child maltreatment allegations. This tool employs predictive models to assign risk scores, thereby assisting call workers who assess these allegations. The primary question explored is whether humans can effectively identify erroneous algorithmic scores and make better decisions when necessary.

Key Findings

Behavioral Adjustments Post-Deployment: The paper found that upon deployment of the AFST, call workers exhibited significant behavioral changes. Prior to the tool's deployment, screen-in rates for child welfare investigations were relatively stable, driven by the capacity limits of available resources. However, post-deployment, there was a marked increase in the alignment of screen-in decisions with the risk scores provided by the algorithm. This suggests that decision-makers were influenced by the tool's recommendations, reflecting neither complete algorithm aversion nor automation bias.
Human Correction of Erroneous Scores: When the algorithm provided erroneous scores due to a technical glitch, humans were capable of assessing these errors and making decisions that did not align with the flawed machine recommendations. For instance, screen-in rates for obviously incorrect low-risk scores were higher than expected, indicating that workers could detect cases where risk was underestimated. Moreover, the analysis indicates that in many instances, humans did not blindly adhere to the algorithms, effectively balancing trust and their professional judgment.
Implications for Automation Bias and Algorithm Aversion: The findings suggest that the configuration of the decision-support system encouraged appropriate human-machine complementarity. In terms of automation bias, where users over-rely on automated systems, the paper found that call workers did not excessively defer to the machine’s recommendation even when presented as mandatory unless the assessment aligned with their judgment. Conversely, despite access to machine-generated scores, their discernment in overruling the system in certain scenarios indicates that complete algorithm aversion did not occur either.
Racial and Socioeconomic Disparities: The research also delved into whether reliance on algorithmic systems exacerbated existing racial or socioeconomic disparities in decision-making. It found minimal changes in disparity levels, indicating that the decision-support tool neither significantly mitigated nor exacerbated disparities across racial lines or socioeconomic groups.

Broader Implications

The paper underscores the importance of human oversight in automated decision-making processes, especially in high-stake environments such as child welfare. It advocates for design models where human expertise and algorithmic power complement each other while avoiding extremes of unquestioning trust or skepticism of machine outputs. This balance optimizes decision outcomes, tapping into the strengths of both human intuition and algorithmic efficiency.

In conclusion, this paper provides insightful contributions to the discourse on human-in-the-loop systems, asserting that humans, equipped with proper support and autonomy, can effectively manage erroneous predictions in algorithm-assisted frameworks. Future research should explore how best to structure these systems for optimal use of human judgment, thereby enhancing the efficacy and fairness of algorithmic decision-making in sensitive domains.

PDF Markdown

Related Papers

YouTube

Show All Videos