Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

"Are You Really Sure?" Understanding the Effects of Human Self-Confidence Calibration in AI-Assisted Decision Making (2403.09552v1)

Published 14 Mar 2024 in cs.HC

Abstract: In AI-assisted decision-making, it is crucial but challenging for humans to achieve appropriate reliance on AI. This paper approaches this problem from a human-centered perspective, "human self-confidence calibration". We begin by proposing an analytical framework to highlight the importance of calibrated human self-confidence. In our first study, we explore the relationship between human self-confidence appropriateness and reliance appropriateness. Then in our second study, We propose three calibration mechanisms and compare their effects on humans' self-confidence and user experience. Subsequently, our third study investigates the effects of self-confidence calibration on AI-assisted decision-making. Results show that calibrating human self-confidence enhances human-AI team performance and encourages more rational reliance on AI (in some aspects) compared to uncalibrated baselines. Finally, we discuss our main findings and provide implications for designing future AI-assisted decision-making interfaces.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (94)
  1. Beyond accuracy: The role of mental models in human-AI team performance. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7. 2–11.
  2. Does the whole exceed its parts? the effect of ai explanations on complementary team performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–16.
  3. How Cognitive Biases Affect XAI-assisted Decision-making: A Systematic Review. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 78–91.
  4. Silvia Bonaccio and Reeshad S Dalal. 2006. Advice taking and decision-making: An integrative literature review, and implications for the organizational sciences. Organizational behavior and human decision processes 101, 2 (2006), 127–151.
  5. Jochen Bröcker and Leonard A Smith. 2007. Increasing the reliability of reliability diagrams. Weather and forecasting 22, 3 (2007), 651–661.
  6. Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems. In Proceedings of the 25th international conference on intelligent user interfaces. 454–464.
  7. To trust or to think: cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (2021), 1–21.
  8. Improving Human-AI Collaboration With Descriptions of AI Behavior. Proc. ACM Hum.-Comput. Interact. 7, CSCW1, Article 136 (apr 2023), 21 pages. https://doi.org/10.1145/3579612
  9. The efficient assessment of need for cognition. Journal of personality assessment 48, 3 (1984), 306–307.
  10. Human confidence in artificial intelligence and in themselves: The evolution and impact of confidence on adoption of AI advice. Computers in Human Behavior 127 (2022), 107018.
  11. Mark Considine. 2012. Thinking outside the box? Applying design theory to public policy. Politics & Policy 40, 4 (2012), 704–724.
  12. A confidence-credibility model of expert witness persuasion: Mediating effects and implications for trial consultation. Consulting Psychology Journal: Practice and Research 63, 2 (2011), 129.
  13. Mary Cummings. 2004. Automation bias in intelligent time critical decision support systems. In AIAA 1st intelligent systems technical conference. 6313.
  14. Morris H DeGroot and Stephen E Fienberg. 1983. The comparison and evaluation of forecasters. Journal of the Royal Statistical Society: Series D (The Statistician) 32, 1-2 (1983), 12–22.
  15. The accuracy-confidence correlation in the detection of deception. Personality and Social Psychology Review 1, 4 (1997), 346–357.
  16. Algorithm aversion: people erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General 144, 1 (2015), 114.
  17. Annie Duke. 2019. Thinking in bets: Making smarter decisions when you don’t have all the facts. Penguin.
  18. Why people fail to recognize their own incompetence. Current directions in psychological science 12, 3 (2003), 83–87.
  19. Statistical power analyses using G* Power 3.1: Tests for correlation and regression analyses. Behavior research methods 41, 4 (2009), 1149–1160.
  20. Adrian Furnham and Hua Chu Boo. 2011. A literature review of the anchoring effect. The journal of socio-economics 40, 1 (2011), 35–42.
  21. Explainable active learning (xal) toward ai explanations as interfaces for machine teachers. Proceedings of the ACM on Human-Computer Interaction 4, CSCW3 (2021), 1–28.
  22. Claudia González-Vallejo and Aaron Bonham. 2007. Aligning confidence with accuracy: Revisiting the role of feedback. Acta Psychologica 125, 2 (2007), 221–239.
  23. Matúš Grežo. 2021. Overconfidence and financial decision-making: a meta-analysis. Review of Behavioral Finance 13, 3 (2021), 276–296.
  24. There are things that we know that we know, and there are things that we do not know we do not know: Confidence in decision-making. Neuroscience & Biobehavioral Reviews 55 (2015), 88–97.
  25. On calibration of modern neural networks. In International conference on machine learning. PMLR, 1321–1330.
  26. Sandra G Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the human factors and ergonomics society annual meeting, Vol. 50. Sage publications Sage CA: Los Angeles, CA, 904–908.
  27. Confidence builders: Evaluating seasonal climate forecasts from user perspectives. Bulletin of the American Meteorological Society 83, 5 (2002), 683–698.
  28. Peter Hase and Mohit Bansal. 2020. Evaluating explainable AI: Which algorithmic explanations help users predict model behavior? arXiv preprint arXiv:2005.01831 (2020).
  29. Knowing About Knowing: An Illusion of Human Competence Can Hinder Appropriate Reliance on AI Systems. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–18.
  30. Interaction of Thoughts: Towards Mediating Task Assignment in Human-AI Cooperation with a Capability-Aware Shared Mental Model. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–18.
  31. Daniel Kahneman. 2011. Thinking, fast and slow. Macmillan.
  32. Gary Klein. 2007. Performing a project premortem. Harvard business review 85, 9 (2007), 18–19.
  33. Will you accept an imperfect ai? exploring designs for adjusting end-user expectations of ai systems. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–14.
  34. Ronny Kohavi and Barry Becker. 1996. Adult Income dataset (UCI Machine Learning Repository). https://archive.ics.uci.edu/ml/datasets/Adult/.
  35. Asher Koriat and Robert A Bjork. 2006. Illusions of competence during study can be remedied by manipulations that enhance learners’ sensitivity to retrieval conditions at test. Memory & Cognition 34, 5 (2006), 959–972.
  36. Justin Kruger and David Dunning. 1999. Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of personality and social psychology 77, 6 (1999), 1121.
  37. Tell me more? The effects of mental model soundness on personalizing an intelligent agent. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1–10.
  38. Human-AI Collaboration via Conditional Delegation: A Case Study of Content Moderation. In CHI Conference on Human Factors in Computing Systems. 1–18.
  39. Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies. arXiv preprint arXiv:2112.11471 (2021).
  40. ” Why is’ Chicago’deceptive?” Towards Building Model-Driven Tutorials for Humans. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.
  41. Vivian Lai and Chenhao Tan. 2019. On human predictions with explanations and predictions of machine learning models: A case study on deception detection. In Proceedings of the conference on fairness, accountability, and transparency. 29–38.
  42. John D Lee and Katrina A See. 2004. Trust in automation: Designing for appropriate reliance. Human factors 46, 1 (2004), 50–80.
  43. Co-design and evaluation of an intelligent decision support system for stroke rehabilitation assessment. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (2020), 1–27.
  44. Q Vera Liao and Kush R Varshney. 2021. Human-Centered Explainable AI (XAI): From Algorithms to User Experiences. arXiv preprint arXiv:2110.10790 (2021).
  45. Zhuoran Lu and Ming Yin. 2021. Human Reliance on Machine Learning Models When Performance Feedback is Limited: Heuristics and Risks. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–16.
  46. Metacognitive confidence: A neuroscience approach. Revista de Psicología Social 28, 3 (2013), 317–332.
  47. Who Should I Trust: AI or Myself? Leveraging Human and AI Correctness Likelihood to Promote Appropriate Trust in AI-Assisted Decision-Making. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–19.
  48. Modeling Adaptive Expression of Robot Learning Engagement and Exploring its Effects on Human Teachers. ACM Transactions on Computer-Human Interaction (2022).
  49. SmartEye: assisting instant photo taking via integrating user preference with deep view proposal network. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–12.
  50. Beyond Recommender: An Exploratory Study of the Effects of Different AI Roles in AI-Assisted Decision Making. arXiv preprint arXiv:2403.01791 (2024).
  51. Glancee: An Adaptable System for Instructors to Grasp Student Learning Status in Synchronous Online Classes. In CHI Conference on Human Factors in Computing Systems. 1–25.
  52. Physicians’ diagnostic accuracy, confidence, and resource requests: a vignette study. JAMA internal medicine 173, 21 (2013), 1952–1958.
  53. A meta-analysis of confidence and judgment accuracy in clinical decision making. Journal of Counseling Psychology 62, 4 (2015), 553.
  54. Back to the future: Temporal perspective in the explanation of events. Journal of Behavioral Decision Making 2, 1 (1989), 25–38.
  55. Don A Moore. 2020. Perfectly confident: How to calibrate your decisions wisely. HarperCollins.
  56. Don A Moore and Paul J Healy. 2008. The trouble with overconfidence. Psychological review 115, 2 (2008), 502.
  57. Confidence calibration in a multiyear geopolitical forecasting competition. Management Science 63, 11 (2017), 3552–3565.
  58. Raymond S Nickerson. 1998. Confirmation bias: A ubiquitous phenomenon in many guises. Review of general psychology 2, 2 (1998), 175–220.
  59. Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning. 625–632.
  60. Anchoring Bias Affects Mental Model Formation and User Reliance in Explainable AI Systems. In 26th International Conference on Intelligent User Interfaces. 340–350.
  61. Raja Parasuraman and Dietrich H Manzey. 2010. Complacency and bias in human use of automation: An attentional integration. Human factors 52, 3 (2010), 381–410.
  62. Practice and feedback effects on the confidence-accuracy relation in eyewitness memory. Memory 8, 4 (2000), 235–244.
  63. John Platt et al. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers 10, 3 (1999), 61–74.
  64. Timothy J Pleskac and Jerome R Busemeyer. 2010. Two-stage dynamic signal detection: a theory of choice, decision time, and confidence. Psychological review 117, 3 (2010), 864.
  65. Manipulating and measuring model interpretability. In Proceedings of the 2021 CHI conference on human factors in computing systems. 1–52.
  66. Briony D Pulford and Andrew M Colman. 1997. Overconfidence: Feedback and item difficulty effects. Personality and individual differences 23, 1 (1997), 125–133.
  67. Deciding fast and slow: The role of cognitive biases in ai-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction 6, CSCW1 (2022), 1–22.
  68. Amy Rechkemmer and Ming Yin. 2022. When Confidence Meets Accuracy: Exploring the Effects of Multiple Performance Indicators on Trust in Machine Learning Models. In CHI Conference on Human Factors in Computing Systems. 1–14.
  69. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
  70. Kaspar Rufibach. 2010. Use of Brier score to assess binary predictions. Journal of clinical epidemiology 63, 8 (2010), 938–939.
  71. Appropriate reliance on AI advice: Conceptualization and the effect of explanations. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 410–422.
  72. Performance feedback improves the resolution of confidence judgments. Organizational behavior and human decision processes 42, 3 (1988), 271–283.
  73. RetroLens: A Human-AI Collaborative System for Multi-step Retrosynthetic Route Planning. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
  74. No explainability without accountability: An empirical study of explanations and feedback in interactive ml. In Proceedings of the 2020 chi conference on human factors in computing systems. 1–13.
  75. Janet A Sniezek and Timothy Buckley. 1995. Cueing and cognitive conflict in judge-advisor decision making. Organizational behavior and human decision processes 62, 2 (1995), 159–174.
  76. Aaron Springer and Steve Whittaker. 2019. Progressive disclosure: empirically motivated approaches to designing effective transparency. In Proceedings of the 24th international conference on intelligent user interfaces. 107–120.
  77. Accuracy, confidence, and calibration: how young children and adults assess credibility. Developmental psychology 47, 4 (2011), 1065.
  78. Calibrating trust in AI-assisted decision making.
  79. Explanations can reduce overreliance on ai systems during decision-making. Proceedings of the ACM on Human-Computer Interaction 7, CSCW1 (2023), 1–38.
  80. How to Evaluate Trust in AI-Assisted Decision Making? A Survey of Empirical Methodologies. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–39.
  81. Do humans trust advice more if it comes from ai? an analysis of human-ai interactions. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 763–777.
  82. Designing theory-driven user-centric explainable AI. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–15.
  83. Selecting methods for the analysis of reliance on automation. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 52. SAGE Publications Sage CA: Los Angeles, CA, 287–291.
  84. Xinru Wang and Ming Yin. 2021. Are explanations helpful? a comparative study of the effects of explanations in ai-assisted decision-making. In 26th International Conference on Intelligent User Interfaces. 318–328.
  85. Nathan Weber and Neil Brewer. 2004. Confidence-accuracy calibration in absolute and relative face recognition judgments. Journal of Experimental Psychology: Applied 10, 3 (2004), 156.
  86. Measuring and Understanding Trust Calibrations for Automated Systems: A Survey of the State-Of-The-Art and Future Directions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–16.
  87. How do visual explanations foster end users’ appropriate trust in machine learning?. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 189–201.
  88. Harnessing biomedical literature to calibrate clinicians’ trust in AI decision support systems. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–14.
  89. Understanding the effect of accuracy on trust in machine learning models. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–12.
  90. Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 295–305.
  91. Evaluating the impact of uncertainty visualization on model reliance. IEEE Transactions on Visualization and Computer Graphics (2023).
  92. Competent but Rigid: Identifying the Gap in Empowering AI to Participate Equally in Group Decision-Making. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–19.
  93. Charting the Future of AI in Project-Based Learning: A Co-Design Exploration with Students. arXiv preprint arXiv:2401.14915 (2024).
  94. Bias-Aware Design for Informed Decisions: Raising Awareness of Self-Selection Bias in User Ratings and Reviews. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–31.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Shuai Ma (86 papers)
  2. Xinru Wang (18 papers)
  3. Ying Lei (8 papers)
  4. Chuhan Shi (12 papers)
  5. Ming Yin (70 papers)
  6. Xiaojuan Ma (74 papers)
Citations (8)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets