Improving Expert Predictions with Conformal Prediction (2201.12006v5)
Abstract: Automated decision support systems promise to help human experts solve multiclass classification tasks more efficiently and accurately. However, existing systems typically require experts to understand when to cede agency to the system or when to exercise their own agency. Otherwise, the experts may be better off solving the classification tasks on their own. In this work, we develop an automated decision support system that, by design, does not require experts to understand when to trust the system to improve performance. Rather than providing (single) label predictions and letting experts decide when to trust these predictions, our system provides sets of label predictions constructed using conformal prediction$\unicode{x2014}$prediction sets$\unicode{x2014}$and forcefully asks experts to predict labels from these sets. By using conformal prediction, our system can precisely trade-off the probability that the true label is not in the prediction set, which determines how frequently our system will mislead the experts, and the size of the prediction set, which determines the difficulty of the classification task the experts need to solve using our system. In addition, we develop an efficient and near-optimal search method to find the conformal predictor under which the experts benefit the most from using our system. Simulation experiments using synthetic and real expert predictions demonstrate that our system may help experts make more accurate predictions and is robust to the accuracy of the classifier the conformal predictor relies on.
- Uncertainty sets for image classifiers using conformal prediction. In International Conference on Learning Representations.
- A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511.
- On the utility of prediction sets in human-ai teams. arXiv preprint arXiv:2205.01411.
- Conformal prediction for reliable machine learning: theory, adaptations and applications. Newnes.
- Beyond accuracy: The role of mental models in human-ai team performance. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 7(1):2–11.
- Distribution-free, risk-controlling prediction sets. J. ACM, 68(6).
- Beach, L. R. (1993). Broadening the definition of decision making: The role of prechoice screening of options. Psychological Science, 4(4):215–220.
- Discrete choice models with latent choice sets. International Journal of Research in Marketing, 12(1):9–24.
- When humans and machines make joint decisions: A non-symmetric bandit model. arXiv preprint arXiv:2007.04800.
- Set-valued classification–overview via a unified framework. arXiv preprint arXiv:2102.12318.
- Regression under human assistance. In Proceedings of the AAAI Conference on Artificial Intelligence.
- Classification under human assistance. In Proceedings of the AAAI Conference on Artificial Intelligence.
- Learning nondeterministic classifiers. Journal of Machine Learning Research, 10(10).
- Human decision making with machine assistance: An experiment on bailing and jailing. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW):1–25.
- Distribution-free binary classification: prediction sets, confidence intervals and calibration. In Advances in Neural Information Processing Systems.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
- Identity mappings in deep residual networks. In European conference on computer vision, pages 630–645. Springer.
- Heiss, F. (2016). Discrete choice methods with simulation. Taylor & Francis.
- Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269.
- Adding asymmetrically dominated alternatives: Violations of regularity and the similarity hypothesis. Journal of consumer research, 9(1):90–98.
- Hulsman, R. (2022). Distribution-free finite-sample guarantees and split conformal prediction. arXiv preprint arXiv:2210.14735.
- A deep learning system accurately classifies primary and metastatic cancers using passenger mutation patterns. Nature communications, 11(1):1–12.
- Combining human predictions with model probabilities via confusion matrices and calibration. arXiv preprint arXiv:2109.14591.
- Learning multiple layers of features from tiny images. Citeseer.
- Assessing the impact of automated suggestions on decision making: Domain experts mediate model errors but take less initiative. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems.
- Evaluating eligibility criteria of oncology trials using real-world data and ai. Nature, 592(7855):629–633.
- Credal classification rule for uncertain data based on belief functions. Pattern Recognition, 47(7):2532–2541.
- Ask not what ai can do, but what ai should do: Towards a framework of task delegability. Advances in Neural Information Processing Systems, 32:57–67.
- Luce, R. D. (1959). On the possible psychophysical laws. Psychological review, 66(2):81.
- Partial classification in the belief function framework. Knowledge-Based Systems, 214:106742.
- Learning to switch among agents in a team. Transactions on Machine Learning Research.
- Efficient set-valued prediction in multi-class classification. Data Mining and Knowledge Discovery, 35(4):1435–1469.
- Consistent estimators for learning to defer to an expert. In International Conference on Machine Learning, pages 7076–7087.
- Multilabel classification with partial abstention: Bayes-optimal prediction under label independence. Journal of Artificial Intelligence Research, 72:613–665.
- The role of domain expertise in user trust and the impact of first impressions with intelligent systems. ArXiv, abs/2008.09100.
- Differentiable learning under triage. In Advances in Neural Information Processing Systems.
- How model accuracy and explanation fidelity influence user trust. arXiv preprint arXiv:1907.12652.
- Human uncertainty makes classification more robust. arXiv preprint arXiv:1908.07086.
- Distribution-free uncertainty quantification for classification under label shift. In Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence.
- The algorithmic automation problem: Prediction, triage, and human effort. arXiv preprint arXiv:1903.12220.
- Conformalized quantile regression. Advances in Neural Information Processing Systems, 32:3543–3553.
- Classification with valid and adaptive coverage. Advances in Neural Information Processing Systems, 33:3581–3591.
- Simonson, I. (1989). Choice based on reasons: The case of attraction and compromise effects. Journal of consumer research, 16(2):158–174.
- Reinforcement learning under algorithmic triage. arXiv preprint arXiv:2109.11328.
- Misplaced trust: Measuring the interference of machine learning in human decision-making. In 12th ACM Conference on Web Science, pages 315–324.
- Tversky, A. (1972). Elimination by aspects: A theory of choice. Psychological review, 79(4):281.
- Context-dependent preferences. Management Science, 39(10):1179–1189.
- Uncalibrated models can improve human-ai collaboration. In Advances in Neural Information Processing Systems.
- Algorithmic Learning in a Random World. Springer-Verlag, Berlin, Heidelberg.
- Improving screening processes via calibrated subset selection. In Proceedings of the 39th International Conference on Machine Learning.
- Are explanations helpful? a comparative study of the effects of explanations in ai-assisted decision-making. In 26th International Conference on Intelligent User Interfaces, pages 318–328.
- Phased decision strategies: Sequels to an initial screening. Graduate School of Business, Stanford University.
- Cautious classification with nested dichotomies and imprecise probabilities. Soft Computing, 21(24):7447–7462.
- Understanding the effect of accuracy on trust in machine learning models. In Proceedings of the 2019 chi conference on human factors in computing systems, pages 1–12.
- Effect of confidence and explanation on accuracy and trust calibration in ai-assisted decision making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pages 295–305.