Manipulating and Measuring Model Interpretability (1802.07810v5)

Published 21 Feb 2018 in cs.AI and cs.CY

Abstract: With machine learning models being increasingly used to aid decision making even in high-stakes domains, there has been a growing interest in developing interpretable models. Although many supposedly interpretable models have been proposed, there have been relatively few experimental studies investigating whether these models achieve their intended effects, such as making people more closely follow a model's predictions when it is beneficial for them to do so or enabling them to detect when a model has made a mistake. We present a sequence of pre-registered experiments (N=3,800) in which we showed participants functionally identical models that varied only in two factors commonly thought to make machine learning models more or less interpretable: the number of features and the transparency of the model (i.e., whether the model internals are clear or black box). Predictably, participants who saw a clear model with few features could better simulate the model's predictions. However, we did not find that participants more closely followed its predictions. Furthermore, showing participants a clear model meant that they were less able to detect and correct for the model's sizable mistakes, seemingly due to information overload. These counterintuitive findings emphasize the importance of testing over intuition when developing interpretable models.

PDF Abstract

Understanding Interpretability in Machine Learning Models: A Comprehensive Study

The paper "Manipulating and Measuring Model Interpretability" explores the impact of interpretability factors on human interaction with machine learning models. The paper focuses on how model transparency and the number of features affect users' ability to understand and utilize model predictions. The researchers employ a series of controlled experiments to investigate these influences, providing insightful findings that challenge prevailing assumptions about interpretability.

Key Findings and Analysis

The researchers conducted pre-registered experiments with 3,800 participants to assess the effects of model transparency and feature number on three outcomes:

Simulation Capability: Participants were better able to simulate predictions from models that were clear and had fewer features. This was consistent across two experiments involving real-estate valuation tasks, indicating that simple and transparent models enhance users' ability to internalize model logic.
Following Predictions: Contrary to assumptions, simpler and more transparent models did not significantly influence the extent to which participants followed their predictions when advantageous. This suggests that transparency alone does not necessarily improve decision-making consistency, highlighting the need for further exploration into factors that promote adherence to model advice.
Error Detection: Surprisingly, participants struggled more with detecting errors in transparent models, possibly due to information overload. This indicates that transparency might lead to cognitive overload, impairing error detection capabilities.

Implications and Future Directions

The results underline that intuitive notions of interpretability—specifically, the presumed benefits of transparency—do not always align with empirical evidence. This challenges designers and researchers to reconsider the commonly held belief that transparent models inherently lead to better human-model collaboration.

Practically, the paper suggests that merely revealing model internals is insufficient for effective decision-making support. Systems designed for human use should account for cognitive load management, possibly by incorporating auxiliary systems to manage information overload or by selectively revealing model details upon user request.

Theoretically, the findings prompt reassessment of interpretability metrics in AI, advocating for behavior-based evaluations rather than relying solely on structural features of models. The exploration of alternative or complementary approaches, such as auxiliary models that highlight potential outliers or the sequential presentation of data, is warranted.

Future research should extend beyond linear regression models and include diverse domains and user expertise levels. Longitudinal studies incorporating process measures could further illuminate the cognitive mechanisms at play when interacting with interpretable models.

Conclusion

This paper provides a comprehensive analysis of the multifaceted nature of model interpretability and its practical implications. By emphasizing empirical testing over intuition, it contributes significantly to understanding how machine learning models can be effectively integrated into human decision-making processes. The work calls for nuanced approaches to designing and presenting AI systems to optimize human-machine collaboration.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Forough Poursabzi-Sangdeh (3 papers)
Daniel G. Goldstein (8 papers)
Jake M. Hofman (14 papers)
Jennifer Wortman Vaughan (52 papers)
Hanna Wallach (48 papers)

Citations (632)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos