Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models (2405.12063v2)

Published 20 May 2024 in cs.CL

Abstract: LLMs are increasingly used to meet user information needs, but their effectiveness in dealing with user queries that contain various types of ambiguity remains unknown, ultimately risking user trust and satisfaction. To this end, we introduce CLAMBER, a benchmark for evaluating LLMs using a well-organized taxonomy. Building upon the taxonomy, we construct ~12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs. Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries, even enhanced by chain-of-thought (CoT) and few-shot prompting. These techniques may result in overconfidence in LLMs and yield only marginal enhancements in identifying ambiguity. Furthermore, current LLMs fall short in generating high-quality clarifying questions due to a lack of conflict resolution and inaccurate utilization of inherent knowledge. In this paper, CLAMBER presents a guidance and promotes further research on proactive and trustworthy LLMs. Our dataset is available at https://github.com/zt991211/CLAMBER

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Orcas-i: Queries annotated with intent using weak supervision. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, page 3057–3066, New York, NY, USA. Association for Computing Machinery.
  2. Building and evaluating open-domain dialogue corpora with clarifying questions. arXiv preprint arXiv:2109.05794.
  3. Knowledge of knowledge: Exploring known-unknowns uncertainty with large language models.
  4. Daniel M Berry and Erik Kamsties. 2004. Ambiguity in requirements specification. In Perspectives on software requirements, pages 7–44. Springer.
  5. Evaluating entity disambiguation and the role of popularity in retrieval-based NLP. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4472–4485, Online. Association for Computational Linguistics.
  6. Overview of the trec 2009 web track. In Trec, volume 9, pages 20–29.
  7. Did you mean a or b? supporting clarification dialog for entity disambiguation. In Sumpre-hswi@ eswc.
  8. Selectively answering ambiguous questions. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 530–543, Singapore. Association for Computational Linguistics.
  9. Free dolly: Introducing the world’s first truly open instruction-tuned llm.
  10. Rethinking conversational agents in the era of llms: Proactivity, non-collaborativity, and beyond. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, pages 298–301.
  11. Prompting and evaluating large language models for proactive dialogues: Clarification, target-guided, and non-collaboration. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10602–10621, Singapore. Association for Computational Linguistics.
  12. Rephrase and respond: Let large language models ask better questions for themselves. arXiv preprint arXiv:2311.04205.
  13. A survey for in-context learning. arXiv preprint arXiv:2301.00234.
  14. Jonathan Ginzburg. 1996. Interrogatives: Questions, facts and dialogue. The handbook of contemporary semantic theory, 5(18):359–423.
  15. Abg-coqa: Clarifying ambiguity in conversational question answering. In 3rd Conference on Automated Knowledge Base Construction.
  16. Do large language models know about facts?
  17. Concept – an evaluation protocol on conversational recommender systems with system-centric and user-centric factors.
  18. Hagrid: A human-llm collaborative dataset for generative information-seeking with attribution.
  19. Kimiya Keyvan and Jimmy Xiangji Huang. 2022. How to approach ambiguous queries in conversational search: A survey of techniques, approaches, tools, and challenges. ACM Computing Surveys, 55(6):1–40.
  20. Clam: Selective clarification for ambiguous questions with generative language models. arXiv preprint arXiv:2212.07769.
  21. Clam: Selective clarification for ambiguous questions with generative language models.
  22. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466.
  23. Asking clarification questions to handle ambiguity in open-domain qa.
  24. Proactive conversational agents in the post-chatgpt world. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3452–3455.
  25. AmbigQA: Answering ambiguous open-domain questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5783–5797, Online. Association for Computational Linguistics.
  26. Ambipun: Generating puns with ambiguous context. Association for Computational Linguistics (ACL).
  27. OpenAI. 2022. Introducing chatgpt. https://openai.com/blog/chatgpt.
  28. Linguistic ambiguity analysis in chatgpt. arXiv preprint arXiv:2302.06426.
  29. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  30. Kwaiagents: Generalized information-seeking agent system with large language models. arXiv preprint arXiv:2312.04889.
  31. Alain Pinsonneault and Kenneth L Kraemer. 1989. The impact of technological support on groups: An assessment of the empirical research. Decision Support Systems, 5(2):197–216.
  32. A survey on asking clarification questions datasets in conversational systems. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2698–2716, Toronto, Canada. Association for Computational Linguistics.
  33. Identifying ambiguous queries in web search. In Proceedings of the 16th international conference on World Wide Web, pages 1169–1170.
  34. Task ambiguity in humans and language models.
  35. Jan Trienes and Krisztian Balog. 2019. Identifying unclear questions in community question answering websites. In Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part I 41, pages 276–289. Springer.
  36. Resolving knowledge conflicts in large language models. arXiv preprint arXiv:2310.00935.
  37. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  38. Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms. arXiv preprint arXiv:2306.13063.
  39. Asking clarification questions in knowledge-based question answering. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 1618–1629.
  40. Alcuna: Large language models meet new knowledge. arXiv preprint arXiv:2310.14820.
  41. Ambicoref: Evaluating human and model sensitivity to ambiguous coreference.
  42. Generating clarifying questions for information retrieval. In Proceedings of the web conference 2020, pages 418–428.
  43. Michael J. Q. Zhang and Eunsol Choi. 2023. Clarify when necessary: Resolving ambiguity through interaction with lms.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Tong Zhang (569 papers)
  2. Peixin Qin (21 papers)
  3. Yang Deng (113 papers)
  4. Chen Huang (88 papers)
  5. Wenqiang Lei (66 papers)
  6. Junhong Liu (13 papers)
  7. Dingnan Jin (8 papers)
  8. Hongru Liang (18 papers)
  9. Tat-Seng Chua (360 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.