Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Aligning Language Models to Explicitly Handle Ambiguity (2404.11972v3)

Published 18 Apr 2024 in cs.CL

Abstract: In interactions between users and LLM agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure reliability. However, even state-of-the-art LLMs still face challenges in such scenarios, primarily due to the following hurdles: (1) LLMs are not explicitly trained to deal with ambiguous utterances; (2) the degree of ambiguity perceived by the LLMs may vary depending on the possessed knowledge. To address these issues, we propose Alignment with Perceived Ambiguity (APA), a novel pipeline that aligns LLMs to manage ambiguous queries by leveraging their own assessment of ambiguity (i.e., perceived ambiguity). Experimental results on question-answering datasets demonstrate that APA empowers LLMs to explicitly detect and manage ambiguous queries while retaining the ability to answer clear questions. Furthermore, our finding proves that APA excels beyond training with gold-standard labels, especially in out-of-distribution scenarios. The data and code are available at https://github.com/heyjoonkim/APA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion, 76:243–297.
  2. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  3. Knowledge of knowledge: Exploring known-unknowns uncertainty with large language models. Preprint, arXiv:2305.13712.
  4. Training a helpful and harmless assistant with reinforcement learning from human feedback. Preprint, arXiv:2204.05862.
  5. Constitutional ai: Harmlessness from ai feedback. Preprint, arXiv:2212.08073.
  6. Maxmin-rlhf: Towards equitable alignment of large language models with diverse human preferences. Preprint, arXiv:2402.08925.
  7. Alpagasus: Training a better alpaca model with fewer data. In The Twelfth International Conference on Learning Representations.
  8. Jonathan H Choi. 2024. Measuring clarity in legal text. U. Chi. L. Rev., 91:1.
  9. Data cleaning: Overview and emerging challenges. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, page 2201–2206, New York, NY, USA. Association for Computing Machinery.
  10. Selectively answering ambiguous questions. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 530–543, Singapore. Association for Computational Linguistics.
  11. Qlora: Efficient finetuning of quantized llms. Preprint, arXiv:2305.14314.
  12. Enhancing chat language models by scaling high-quality instructional conversations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3029–3051, Singapore. Association for Computational Linguistics.
  13. RAFT: Reward ranked finetuning for generative foundation model alignment. Transactions on Machine Learning Research.
  14. Romina Etezadi and Mehrnoush Shamsfard. 2023. The state of the art in open domain complex question answering: a survey. Applied Intelligence, 53(4):4124–4144.
  15. H.A. Gleason. 1963. Linguistics and English Grammar. H.A. Gleason jr.
  16. Gilda: biomedical entity text normalization with machine-learned disambiguation as a service. Bioinformatics Advances, 2(1):vbac034.
  17. Beavertails: Towards improved safety alignment of LLM via a human-preference dataset. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  18. Ai alignment: A comprehensive survey. Preprint, arXiv:2310.19852.
  19. Mistral 7b. Preprint, arXiv:2310.06825.
  20. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada. Association for Computational Linguistics.
  21. Daniel Jurafsky. 1996. A probabilistic model of lexical and syntactic access and disambiguation. Cognitive science, 20(2):137–194.
  22. Tree of clarifications: Answering ambiguous questions with retrieval-augmented large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 996–1009, Singapore. Association for Computational Linguistics.
  23. (QA)22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Question answering with questionable assumptions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8466–8487, Toronto, Canada. Association for Computational Linguistics.
  24. Openassistant conversations - democratizing large language model alignment. In Advances in Neural Information Processing Systems, volume 36, pages 47669–47681. Curran Associates, Inc.
  25. Opportunities and challenges in data-centric ai. IEEE Access.
  26. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.
  27. Asking clarification questions to handle ambiguity in open-domain QA. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 11526–11544, Singapore. Association for Computational Linguistics.
  28. Scalable agent alignment via reward modeling: a research direction. CoRR, abs/1811.07871.
  29. Chin-Yew Lin and Franz Josef Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pages 605–612, Barcelona, Spain.
  30. We’re afraid language models aren’t modeling ambiguity. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 790–807, Singapore. Association for Computational Linguistics.
  31. What makes good data for alignment? a comprehensive study of automatic data selection in instruction tuning. In The Twelfth International Conference on Learning Representations.
  32. Enhancing llm safety via constrained direct preference optimization. Preprint, arXiv:2403.02475.
  33. Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In International Conference on Learning Representations.
  34. Donald G Mackay and Thomas G Bever. 1967. In search of ambiguity. Perception & Psychophysics, 2:193–200.
  35. A. Majeed and S. Hwang. 2023. Data-centric artificial intelligence, preprocessing, and the quest for transformative artificial intelligence systems development. Computer, 56(05):109–115.
  36. Andrey Malinin and Mark Gales. 2021. Uncertainty estimation in autoregressive structured prediction. In International Conference on Learning Representations.
  37. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft.
  38. AmbigQA: Answering ambiguous open-domain questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5783–5797, Online. Association for Computational Linguistics.
  39. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744.
  40. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  41. Interactive-chain-prompting: Ambiguity resolution for crosslingual conditional generation with interaction. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 455–483, Nusa Dua, Bali. Association for Computational Linguistics.
  42. Massimo Poesio and Ron Artstein. 2005. The reliability of anaphoric annotation, reconsidered: Taking ambiguity into account. In Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky, pages 76–83, Ann Arbor, Michigan. Association for Computational Linguistics.
  43. Direct preference optimization: Your language model is secretly a reward model. In Thirty-seventh Conference on Neural Information Processing Systems.
  44. Sanford Schane. 2002. Ambiguity and misunderstanding in the law. T. Jefferson L. Rev., 25:167.
  45. ASQA: Factoid questions meet long-form answers. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8273–8288, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  46. Mark Stevenson and Yikun Guo. 2010. Disambiguation in the biomedical domain: the role of ambiguity type. Journal of biomedical informatics, 43(6):972–981.
  47. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  48. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
  49. Fine-tuning language models for factuality. In The Twelfth International Conference on Learning Representations.
  50. Llama 2: Open foundation and fine-tuned chat models. Preprint, arXiv:2307.09288.
  51. The puzzle of ambiguity. Morphology and the web of grammar: Essays in memory of Steven G. Lapointe, pages 265–282.
  52. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  53. Less: Selecting influential data for targeted instruction tuning. Preprint, arXiv:2402.04333.
  54. WizardLM: Empowering large pre-trained language models to follow complex instructions. In The Twelfth International Conference on Learning Representations.
  55. Alignment for honesty. arXiv preprint arXiv:2312.07000.
  56. Do large language models know what they don’t know? In Findings of the Association for Computational Linguistics: ACL 2023, pages 8653–8665, Toronto, Canada. Association for Computational Linguistics.
  57. AmbiCoref: Evaluating human and model sensitivity to ambiguous coreference. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1023–1030, Dubrovnik, Croatia. Association for Computational Linguistics.
  58. A survey on complex factual question answering. AI Open, 4:1–12.
  59. Michael Zhang and Eunsol Choi. 2021. SituatedQA: Incorporating extra-linguistic contexts into QA. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7371–7387, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  60. Lima: Less is more for alignment. Advances in Neural Information Processing Systems, 36.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Hyuhng Joon Kim (10 papers)
  2. Youna Kim (7 papers)
  3. Cheonbok Park (20 papers)
  4. Junyeob Kim (7 papers)
  5. Choonghyun Park (6 papers)
  6. Kang Min Yoo (40 papers)
  7. Sang-goo Lee (40 papers)
  8. Taeuk Kim (38 papers)
Citations (2)
Youtube Logo Streamline Icon: https://streamlinehq.com