Intuitive or Dependent? Investigating LLMs' Behavior Style to Conflicting Prompts (2309.17415v3)
Abstract: This study investigates the behaviors of LLMs when faced with conflicting prompts versus their internal memory. This will not only help to understand LLMs' decision mechanism but also benefit real-world applications, such as retrieval-augmented generation (RAG). Drawing on cognitive theory, we target the first scenario of decision-making styles where there is no superiority in the conflict and categorize LLMs' preference into dependent, intuitive, and rational/irrational styles. Another scenario of factual robustness considers the correctness of prompt and memory in knowledge-intensive tasks, which can also distinguish if LLMs behave rationally or irrationally in the first scenario. To quantify them, we establish a complete benchmarking framework including a dataset, a robustness evaluation pipeline, and corresponding metrics. Extensive experiments with seven LLMs reveal their varying behaviors. And, with role play intervention, we can change the styles, but different models present distinct adaptivity and upper-bound. One of our key takeaways is to optimize models or the prompts according to the identified style. For instance, RAG models with high role play adaptability may dynamically adjust the interventions according to the quality of retrieval results -- being dependent to better leverage informative context; and, being intuitive when external prompt is noisy.
- Explanations for commonsenseqa: New dataset and models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3050–3065.
- Anthropic. 2023. Anthropic: Claude.
- Intriguing properties of neural networks.
- Rich knowledge sources bring complex knowledge conflicts: Recalibrating models to reflect conflicting evidence. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2292–2307.
- e-care: a new dataset for exploring explainable causal reasoning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 432–446.
- Making pre-trained language models better few-shot learners. In Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021, pages 3816–3830. Association for Computational Linguistics (ACL).
- Google. 2023. Google: Bard.
- Vincent A Harren. 1979. A model of career decision making for college students. Journal of vocational behavior, 14(2):119–133.
- Deceiving google’s perspective api built for detecting toxic comments. arXiv preprint arXiv:1702.08138.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35.
- Entity-based knowledge conflicts in question answering. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7052–7063.
- The inverse scaling prize.
- Co-writing screenplays and theatre scripts with language models: Evaluation by industry professionals. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–34.
- Cross-task generalization via natural language crowdsourcing instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3470–3487, Dublin, Ireland. Association for Computational Linguistics.
- Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332.
- OpenAI. 2023. Openai: Gpt-4.
- Decision-making styles and problem-solving appraisal. Journal of Counseling psychology, 31(4):497.
- Know what you don’t know: Unanswerable questions for squad. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789.
- Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–7.
- Timo Schick and Hinrich Schütze. 2021. It’s not just size that matters: Small language models are also few-shot learners. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2339–2352.
- Large language models can be easily distracted by irrelevant context. arXiv preprint arXiv:2302.00093.
- Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4222–4235.
- Prompting gpt-3 to be reliable. arXiv preprint arXiv:2210.09150.
- Llama: Open and efficient foundation language models.
- Llama 2: Open foundation and fine-tuned chat models.
- Musique: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics, 10:539–554.
- Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. arXiv preprint arXiv:2305.04388.
- On the robustness of chatgpt: An adversarial and out-of-distribution perspective. arXiv preprint arXiv:2302.12095.
- Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
- Adaptive chameleon or stubborn sloth: Unraveling the behavior of large language models in knowledge conflicts. arXiv preprint arXiv:2305.13300.
- Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pages 12697–12706. PMLR.
- Progressive-hint prompting improves reasoning in large language models. arXiv preprint arXiv:2304.09797.
- Promptbench: Towards evaluating the robustness of large language models on adversarial prompts. arXiv preprint arXiv:2306.04528.
- On robustness of prompt-based semantic parsing with large pre-trained language model: An empirical study on codex. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1090–1102.
- Guido Zuccon and Bevan Koopman. 2023. Dr chatgpt, tell me what i want to hear: How prompt knowledge impacts health answer correctness. arXiv preprint arXiv:2302.13793.