A Formalism and Approach for Improving Robustness of Large Language Models Using Risk-Adjusted Confidence Scores (2310.03283v1)
Abstract: LLMs, such as ChatGPT, have achieved impressive milestones in NLP. Despite their impressive performance, the models are known to pose important risks. As these models are deployed in real-world applications, a systematic understanding of different risks posed by these models on tasks such as natural language inference (NLI), is much needed. In this paper, we define and formalize two distinct types of risk: decision risk and composite risk. We also propose a risk-centric evaluation framework, and four novel metrics, for assessing LLMs on these risks in both in-domain and out-of-domain settings. Finally, we propose a risk-adjusted calibration method called DwD for helping LLMs minimize these risks in an overall NLI architecture. Detailed experiments, using four NLI benchmarks, three baselines and two LLMs, including ChatGPT, show both the practical utility of the evaluation framework, and the efficacy of DwD in reducing decision and composite risk. For instance, when using DwD, an underlying LLM is able to address an extra 20.1% of low-risk inference tasks (but which the LLM erroneously deems high-risk without risk adjustment) and skip a further 19.8% of high-risk tasks, which would have been answered incorrectly.
- Abductive Commonsense Reasoning. In International Conference on Learning Representations.
- Commonsense knowledge reasoning and generation with pre-trained language models: A survey. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 12317–12325.
- PIQA: Reasoning about Physical Commonsense in Natural Language. In Thirty-Fourth AAAI Conference on Artificial Intelligence.
- Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
- Towards Better Decoding and Language Model Integration in Sequence to Sequence Models. Proc. Interspeech 2017, 523–527.
- Human Uncertainty in Concept-Based AI Systems. In The Sixth AAAI/ACM Conference on Artificial Intelligence, Ethics and Society (AIES 2023).
- Confidence modeling for neural semantic parsing. arXiv preprint arXiv:1805.04604.
- Ferrara, E. 2023. Should chatgpt be biased? challenges and risks of bias in large language models. arXiv preprint arXiv:2304.03738.
- The Trade-offs of Domain Adaptation for Neural Language Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 3802–3813.
- On calibration of modern neural networks. In International conference on machine learning, 1321–1330. PMLR.
- Overthinking the truth: Understanding how language models process false demonstrations. arXiv preprint arXiv:2307.09476.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 8340–8349.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136.
- Calibrating structured output predictors for natural language processing. In Proceedings of the conference. Association for Computational Linguistics. Meeting, volume 2020, 2078. NIH Public Access.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12): 1–38.
- How can we know when language models know? on the calibration of language models for question answering. Transactions of the Association for Computational Linguistics, 9: 962–977.
- Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221.
- Selective Question Answering under Domain Shift. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5684–5696.
- On Robustness-Accuracy Characterization of Large Language Models using Synthetic Datasets. In Workshop on Efficient Systems for Foundation Models@ ICML2023.
- Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation. In The Eleventh International Conference on Learning Representations.
- Li, H. 2022. Language models: past, present, and future. Communications of the ACM, 65(7): 56–63.
- Holistic Evaluation of Language Models. arXiv preprint arXiv:2211.09110.
- Murphy, K. P. 2012. Machine learning: a probabilistic perspective. MIT press.
- Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
- Semantically Equivalent Adversarial Rules for Debugging NLP models. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 856–865. Melbourne, Australia: Association for Computational Linguistics.
- Runyan, D. K. 1998. Prevalence, risk, sensitivity, and specificity: A commentary on the epidemiology of child sexual abuse and the development of a research agenda. Child Abuse & Neglect, 22(6): 493–498.
- A survey of evaluation metrics used for NLG systems. ACM Computing Surveys (CSUR), 55(2): 1–39.
- Social IQA: Commonsense Reasoning about Social Interactions. In EMNLP 2019.
- An experimental study measuring the generalization of fine-tuned language representation models across commonsense reasoning benchmarks. Expert Systems, e13243.
- Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning, 31210–31227. PMLR.
- A survey of pretrained language models. In International Conference on Knowledge Science, Engineering and Management, 442–456. Springer.
- Team, U. I. B. a.M. 2020. Roberta Ensemble - Large. https://github.com/isi-nlp/ai2/tree/base.
- Relative Risk. In StatPearls [Internet]. StatPearls Publishing.
- Efficient methods for natural language processing: A survey. Transactions of the Association for Computational Linguistics, 11: 826–860.
- Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting. arXiv preprint arXiv:2305.04388.
- Towards better confidence estimation for neural models. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7335–7339. IEEE.
- Universal Adversarial Triggers for Attacking and Analyzing NLP. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2153–2162. Hong Kong, China: Association for Computational Linguistics.
- Trick Me If You Can: Human-in-the-Loop Generation of Adversarial Examples for Question Answering. Transactions of the Association for Computational Linguistics, 7: 387–401.
- Pre-trained language models and their applications. Engineering.
- On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective. In ICLR 2023 Workshop on Trustworthy and Reliable Large-Scale Machine Learning Models.
- Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021).
- Do Large Language Models Know What They Don’t Know? In Findings of the Association for Computational Linguistics: ACL 2023, 8653–8665. Toronto, Canada: Association for Computational Linguistics.
- HellaSwag: Can a Machine Really Finish Your Sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
- A survey of large language models. arXiv preprint arXiv:2303.18223.
- A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419.
- Towards Identifying Social Bias in Dialog Systems: Framework, Dataset, and Benchmark. In Findings of the Association for Computational Linguistics: EMNLP 2022, 3576–3591.
- Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models. arXiv e-prints, arXiv–2302.
- Exploring ai ethics of chatgpt: A diagnostic analysis. arXiv preprint arXiv:2301.12867.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.