Ask Again, Then Fail: Large Language Models' Vacillations in Judgment (2310.02174v5)
Abstract: We observe that current conversational LLMs often waver in their judgments when faced with follow-up questions, even if the original judgment was correct. This wavering presents a significant challenge for generating reliable responses and building user trust. To comprehensively assess this issue, we introduce a \textsc{Follow-up Questioning Mechanism} along with two metrics to quantify this inconsistency, confirming its widespread presence in current LLMs. To mitigate this issue, we explore various prompting strategies for closed-source models; moreover, we develop a training-based framework \textsc{Unwavering-FQ} that teaches LLMs to maintain their originally correct judgments through synthesized high-quality preference data. Our experimental results confirm the effectiveness of our framework and its ability to enhance the general capabilities of models.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861, 2021.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022.
- A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023, 2023.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Evaluating the feasibility of chatgpt in healthcare: an analysis of multiple clinical and research scenarios. Journal of Medical Systems, 47(1):33, 2023.
- From fiction to fact: the growing role of generative ai in business and finance. Journal of Chinese Economic and Business Studies, pp. 1–26, 2023.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023), 2023.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
- Chatgpt and the rise of large language models: the new ai-driven infodemic threat in public health. Frontiers in Public Health, 11:1166120, 2023.
- Beyond the safeguards: Exploring the security risks of chatgpt. arXiv preprint arXiv:2305.08005, 2023.
- A survey for in-context learning. arXiv preprint arXiv:2301.00234, 2022.
- Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858, 2022.
- Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361, 2021.
- More than you’ve asked for: A comprehensive analysis of novel prompt injection threats to application-integrated large language models. arXiv preprint arXiv:2302.12173, 2023.
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
- An exploratory survey about using chatgpt in education, healthcare, and research. medRxiv, pp. 2023–03, 2023.
- Chatgpt an enfj, bard an ISTJ: empirical study on personalities of large language models. arXiv preprint arXiv:2305.19926, 2023.
- Survey of hallucination in natural language generation. ACM Comput. Surv., 55(12), mar 2023. ISSN 0360-0300. doi: 10.1145/3571730. URL https://doi.org/10.1145/3571730.
- Is chatgpt a good translator? a preliminary study. arXiv preprint arXiv:2301.08745, 2023.
- Matching patients to clinical trials with large language models. arXiv preprint arXiv:2307.15051, 2023.
- Precision medicine, ai, and the future of personalized health care. Clinical and translational science, 14(1):86–93, 2021.
- Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221, 2022.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- Emotionprompt: Leveraging psychology for large language models enhancement via emotional stimulus. arXiv preprint arXiv:2307.11760, 2023.
- Teaching models to express their uncertainty in words. Transactions on Machine Learning Research, 2022. ISSN 2835-8856.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
- Reviewergpt? an exploratory study on using large language models for paper reviewing. arXiv preprint arXiv:2306.00622, 2023.
- Can chatgpt forecast stock price movements? return predictability and large language models. arXiv preprint arXiv:2304.07619, 2023.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786, 2021.
- Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837, 2022.
- OpenAI. Introducing chatgpt. 2022.
- OpenAI. Gpt-4 technical report. 2023.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Are nlp models really able to solve simple math word problems? arXiv preprint arXiv:2103.07191, 2021.
- Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251, 2022.
- Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527, 2022.
- Question decomposition improves the faithfulness of model-generated reasoning. arXiv preprint arXiv:2307.11768, 2023.
- Solving general arithmetic word problems. arXiv preprint arXiv:1608.01413, 2016.
- Malik Sallam. Chatgpt utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. In Healthcare, volume 11, pp. 887. MDPI, 2023.
- Elizabeth Shaunessy. Questioning strategies for teaching the gifted. PRUFROCK PRESS INC., 2005.
- Large language models can be easily distracted by irrelevant context. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 31210–31227. PMLR, 23–29 Jul 2023.
- Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937, 2018.
- Large language models in medicine. Nature medicine, pp. 1–11, 2023.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- The dark side of generative artificial intelligence: A critical analysis of controversies and risks of chatgpt. Entrepreneurial Business and Economics Review, 11(2):7–24, 2023.
- Can chatgpt defend the truth? automatic dialectical evaluation elicits llms’ deficiencies in reasoning. arXiv preprint arXiv:2305.13160, 2023a.
- Is chatgpt a good sentiment analyzer? a preliminary study. arXiv preprint arXiv:2304.04339, 2023b.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Simple synthetic data reduces sycophancy in large language models. arXiv preprint arXiv:2308.03958, 2023.
- Benjamin Weiser. Here’s what happens when your lawyer uses chatgpt. https://www.nytimes.com/2023/05/27/nyregion/avianca-airline-lawsuit-chatgpt.html, 2023.
- Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms. arXiv preprint arXiv:2306.13063, 2023.
- Do large language models know what they don’t know? In Findings of the Association for Computational Linguistics: ACL 2023, pp. 8653–8665, Toronto, Canada, July 2023. Association for Computational Linguistics.
- Chatgpt: Unlocking the future of nlp in finance. Available at SSRN 4323643, 2023.
- Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pp. 12697–12706. PMLR, 2021.
- On large language models’ selection bias in multi-choice questions. arXiv preprint arXiv:2309.03882, 2023.
- Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.