Learning To Guide Human Decision Makers With Vision-Language Models (2403.16501v2)
Abstract: There is increasing interest in developing AIs for assisting human decision-making in high-stakes tasks, such as medical diagnosis, for the purpose of improving decision quality and reducing cognitive strain. Mainstream approaches team up an expert with a machine learning model to which safer decisions are offloaded, thus letting the former focus on cases that demand their attention. his separation of responsibilities setup, however, is inadequate for high-stakes scenarios. On the one hand, the expert may end up over-relying on the machine's decisions due to anchoring bias, thus losing the human oversight that is increasingly being required by regulatory agencies to ensure trustworthy AI. On the other hand, the expert is left entirely unassisted on the (typically hardest) decisions on which the model abstained. As a remedy, we introduce learning to guide (LTG), an alternative framework in which - rather than taking control from the human expert - the machine provides guidance useful for decision making, and the human is entirely responsible for coming up with a decision. In order to ensure guidance is interpretable} and task-specific, we develop SLOG, an approach for turning any vision-LLM into a capable generator of textual guidance by leveraging a modicum of human feedback. Our empirical evaluation highlights the promise of \method on a challenging, real-world medical diagnosis task.
- Is the most accurate ai the best teammate? optimizing ai for teamwork. In AAAI, 2021.
- Yakoub Bazi et al. Vision–language model for visual question answering in medical imagery. Bioengineering, 2023.
- e-SNLI: Natural language inference with natural language explanations. NeurIPS, 31, 2018.
- Generating radiology reports via memory-driven transformer. In EMNLP, 2020.
- Cross-modal memory networks for radiology report generation. In ACL-IJCNLP, 2021.
- Learning with rejection. In Algorithmic Learning Theory, 2016.
- Mary Cummings. Automation bias in intelligent time critical decision support systems. In Collection of Technical Papers - AIAA 1st Intelligent Systems Technical Conference, 2012.
- Regression under human assistance. In AAAI, 2020.
- Abir De et al. Classification under human assistance. In AAAI, 2021.
- Dina Demner-Fushman et al. Preparing a collection of radiology examinations for distribution and retrieval. J Am Med Inform Assoc, 2016.
- Human-algorithm collaboration: Achieving complementarity and avoiding unfairness. In FAT, 2022.
- Supervised and unsupervised language modelling in chest x-ray radiological reports. PLOS ONE, 15, 2020.
- Determinants of llm-assisted decision-making. arXiv:2402.17385, 2024.
- European Commission. Proposal for a regulation laying down harmonised rules on artificial intelligence (artificial intelligence act). eur-lex.europa.eu, 2021.
- Human-AI collaboration with bandit feedback. In IJCAI, 2021.
- Government of Canada. Directive on automated decision-making, 2019.
- Ben Green. The flaws of policies requiring human oversight of government algorithms. Computer Law & Security Review, 2022.
- A survey of methods for explaining black box models. ACM computing surveys (CSUR), 51(5):1–42, 2018.
- A survey on cost types, interaction schemes, and annotator performance models in selection algorithms for active learning in classification. IEEE Access, 2021.
- Benjamin Hou et al. Ratchet: Medical transformer for chest x-ray diagnosis and reporting. In MICCAI, 2021.
- Jeremy Irvin et al. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In AAAI, 2019.
- Alistair Johnson et al. MIMIC-CXR-JPG-chest Radiographs with Structured Labels (version 2.0.0). PhysioNet, 2019.
- Shalmali Joshi et al. Pre-emptive learning-to-defer for sequential medical decision-making under uncertainty. arXiv:2109.06312, 2021.
- Language models (mostly) know what they know. arXiv:2207.05221, 2022.
- Explaining chest x-ray pathologies in natural language. In MICCAI, 2022.
- What uncertainties do we need in bayesian deep learning for computer vision? NeurIPS, 2017.
- Vijay Keswani et al. Towards unbiased and accurate deferral to multiple experts. In AIES, 2021.
- Vijay Keswani et al. Designing closed human-in-the-loop deferral pipelines. arXiv:2202.04718, 2022.
- Nile: Natural language inference with faithful natural language explanations. In ACL, 2020.
- Jessie Liu et al. Incorporating uncertainty in learning to defer algorithms for safe computer-aided diagnosis. Scientific Reports, 2022.
- Learning to generate clinically coherent chest X-ray reports. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1235–1243, Online, November 2020a. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.findings-emnlp.110.
- Learning to generate clinically coherent chest x-ray reports. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1235–1243, 2020b.
- David Madras et al. Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer. NeurIPS, 2018.
- Consistent estimators for learning to defer to an expert. In ICML, 2020.
- Nastaran Okati et al. Differentiable learning under triage. NeurIPS, 2021.
- Training language models to follow instructions with human feedback. NeurIPS, 2022.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- The algorithmic automation problem: Prediction, triage, and human effort. arXiv:1903.12220, 2019.
- Charvi Rastogi et al. Deciding fast and slow: The role of cognitive biases in ai-assisted decision-making. Proc. ACM Hum.-Comput. Interact., 2022.
- Making deep neural networks right for the right scientific reasons by interacting with their explanations. Nature Machine Intelligence, 2(8):476–486, 2020.
- Proximal policy optimization algorithms. arXiv:1707.06347, 2017.
- Paul Hongsuck Seo et al. Reinforcing an image caption generator using off-line human feedback. In AAAI, 2020.
- Burr Settles. Active Learning. Morgan & Claypool Publishers, 2012.
- Transformers in medical imaging: A survey. Medical Image Analysis, 2023.
- Medfusenet: An attention-based multimodal deep learning model for visual question answering in the medical domain. Scientific Reports, 2021.
- Interactive and explainable region-guided radiology report generation. In CVPR, 2023.
- Leveraging explanations in interactive machine learning: An overview. Frontiers in Artificial Intelligence, 2023.
- Calibrated learning to defer with one-vs-all classifiers. In ICML, 2022.
- Cross-modal prototype driven network for radiology report generation. In European Conference on Computer Vision, pages 563–579. Springer, 2022.
- Sheng Wang et al. Chatcad: Interactive computer-aided diagnosis on medical image using large language models. arXiv:2302.07257, 2023.
- Emergent abilities of large language models. TMLR, 2022.
- Bryan Wilder et al. Learning to complement humans. In IJCAI, 2021.
- Clinical-BERT: Vision-language pre-training for radiograph diagnosis and reports generation. In AAAI, 2022.
- Li Yunxiang et al. Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge. arXiv:2303.14070, 2023.
- Fixing mislabeling by human annotators leveraging conflict resolution and prior knowledge. IMWUT, 2019.
- Effect of confidence and explanation on accuracy and trust calibration in ai-assisted decision making. In FAT, 2020.
- Lirex: Augmenting language inference with relevant explanations. In AAAI, 2021.
- Fine-tuning language models from human preferences. arXiv:1909.08593, 2020.