Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 82 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 18 tok/s
GPT-5 High 12 tok/s Pro
GPT-4o 96 tok/s
GPT OSS 120B 467 tok/s Pro
Kimi K2 217 tok/s Pro
2000 character limit reached

DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning (2306.09030v2)

Published 15 Jun 2023 in cs.CL

Abstract: Pragmatic reasoning plays a pivotal role in deciphering implicit meanings that frequently arise in real-life conversations and is essential for the development of communicative social agents. In this paper, we introduce a novel challenge, DiPlomat, aiming at benchmarking machines' capabilities on pragmatic reasoning and situated conversational understanding. Compared with previous works that treat different figurative expressions (e.g. metaphor, sarcasm) as individual tasks, DiPlomat provides a cohesive framework towards general pragmatic understanding. Our dataset is created through the utilization of Amazon Mechanical Turk ( AMT ), resulting in a total of 4, 177 multi-turn dialogues. In conjunction with the dataset, we propose two tasks, Pragmatic Identification and Reasoning (PIR) and Conversational Question Answering (CQA). Experimental results with state-of-the-art (SOTA) neural architectures reveal several significant findings: 1) LLMs ( LLMs) exhibit poor performance in tackling this subjective domain; 2) comprehensive comprehension of context emerges as a critical factor for establishing benign human-machine interactions; 3) current models defect in the application of pragmatic reasoning. As a result, we call on more attention to improve the ability of context understanding, reasoning, and implied meaning modeling.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Consensus and stratification in the affective meaning of human sociality. Proceedings of the National Academy of Sciences, 111(22):8001–8006, 2014.
  2. Alan P Fiske. The four elementary forms of sociality: framework for a unified theory of social relations. Psychological review, 99(4):689, 1992.
  3. The unbearable automaticity of being. American psychologist, 54(7):462, 1999.
  4. Edward Finegan. Language: Its structure and use. Cengage Learning, 2014.
  5. Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
  6. A neural link between affective understanding and interpersonal attraction. Proceedings of the National Academy of Sciences, 113(16):E2248–E2257, 2016.
  7. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  8. Language models are few-shot learners. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  9. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
  10. Large language models are not zero-shot communicators, 2022.
  11. We’re afraid language models aren’t modeling ambiguity, 2023.
  12. GRICE: A grammar-based dataset for recovering implicature and conversational rEasoning. In Annual Meeting of the Association for Computational Linguistics (ACL), pages 2074–2085, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-acl.182. URL https://aclanthology.org/2021.findings-acl.182.
  13. OpenAI. Gpt-4 technical report, 2023.
  14. Improving alignment of dialogue agents via targeted human judgements. arXiv preprint arXiv:2209.14375, 2022.
  15. Mermaid: Metaphor generation with symbolism and discriminative decoding. In Annual Meeting of the Association for Computational Linguistics (ACL), 2021.
  16. Epie dataset: A corpus for possible idiomatic expressions. In Text, Speech, and Dialogue - 23rd International Conference, TSD 2020, Brno, Czech Republic, September 8-11, 2020, Proceedings, 2020.
  17. Potential idiomatic expression (pie)-english: Corpus for classes of idioms. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022.
  18. Colbert: Using bert sentence embedding in parallel neural networks for computational humor, 2022.
  19. Interview: Large-scale modeling of media dialog with discourse patterns and knowledge grounding. In Annual Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8129–8141, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.653. URL https://aclanthology.org/2020.emnlp-main.653.
  20. The Ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 285–294, Prague, Czech Republic, September 2015. Association for Computational Linguistics. doi: 10.18653/v1/W15-4640. URL https://aclanthology.org/W15-4640.
  21. Race: Large-scale reading comprehension dataset from examinations. In Annual Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017.
  22. Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018.
  23. A broad-coverage challenge corpus for sentence understanding through inference. In North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2018.
  24. Personalizing dialogue agents: I have a dog, do you have pets too? In Annual Meeting of the Association for Computational Linguistics (ACL), 2018.
  25. Swag: A large-scale adversarial dataset for grounded commonsense inference. In Annual Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018.
  26. Cosmos qa: Machine reading comprehension with contextual commonsense reasoning. In Annual Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
  27. Coqa: A conversational question answering challenge. In Transactions of the Association for Computational Linguistics (TACL), 2019.
  28. Dream: A challenge dataset and models for dialogue-based reading comprehension. In Transactions of the Association for Computational Linguistics (TACL), 2019.
  29. Dialogue natural language inference. In Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
  30. Drop: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019.
  31. Mutual: A dataset for multi-turn dialogue reasoning. In Annual Meeting of the Association for Computational Linguistics (ACL), 2020.
  32. Are natural language inference models imppressive? learning implicature and presupposition. In Annual Meeting of the Association for Computational Linguistics (ACL), 2020.
  33. Dialogpt: Large-scale generative pre-training for conversational response generation. In Annual Meeting of the Association for Computational Linguistics (ACL), 2020a.
  34. Godel: Large-scale pre-training for goal-directed dialog, 2022.
  35. Lamda: Language models for dialog applications, 2022.
  36. Towards a human-like open-domain chatbot, 2020.
  37. Overview of the ninth dialog system technology challenge: Dstc9, 2020.
  38. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
  39. Palm 2 technical report, 2023.
  40. MOVER: Mask, over-generate and rank for hyperbole generation. In Annual Meeting of the Association for Computational Linguistics (ACL), pages 6018–6030, Seattle, United States, July 2022a. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.440. URL https://aclanthology.org/2022.naacl-main.440.
  41. The rJokes dataset: a large scale humor collection. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6136–6141, Marseille, France, May 2020. European Language Resources Association. ISBN 979-10-95546-34-4. URL https://aclanthology.org/2020.lrec-1.753.
  42. Predicting pragmatic reasoning in language games. Science, 336(6084):998–998, 2012. doi: 10.1126/science.1218633. URL https://www.science.org/doi/abs/10.1126/science.1218633.
  43. IMPLI: Investigating NLI models’ performance on figurative language. In Annual Meeting of the Association for Computational Linguistics (ACL), pages 5375–5388, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.369. URL https://aclanthology.org/2022.acl-long.369.
  44. It’s not rocket science : Interpreting figurative language in narratives. In Annual Meeting of the Association for Computational Linguistics (ACL), 2022.
  45. Cicero: A dataset for contextualized commonsense inference in dialogues. In Annual Meeting of the Association for Computational Linguistics (ACL), 2022.
  46. Cider: Commonsense inference for dialogue explanation and reasoning. In Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2021.
  47. Probing commonsense explanation in dialogue response generation. In Annual Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
  48. Roberta: A robustly optimized bert pretraining approach. 2019. URL http://arxiv.org/abs/1907.11692.
  49. Mover: Mask, over-generate and rank for hyperbole generation. In Annual Meeting of the Association for Computational Linguistics (ACL), 2022b.
  50. Bertscore: Evaluating text generation with bert. In Annual Meeting of the Association for Computational Linguistics (ACL), 2020b.
  51. From recognition to cognition: Visual commonsense reasoning. In Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  52. Sentence-bert: Sentence embeddings using siamese bert-networks. In Annual Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 11 2019. URL http://arxiv.org/abs/1908.10084.
  53. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
  54. Exploring the limits of transfer learning with a unified text-to-text transformer. In Journal of Machine Learning Research (JMLR), 2020.
  55. Unifiedqa: Crossing format boundaries with a single qa system. In Annual Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
  56. mT5: A massively multilingual pre-trained text-to-text transformer. In North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 483–498, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.41. URL https://aclanthology.org/2021.naacl-main.41.
  57. Modeling semantic containment and exclusion in natural language inference. In International Conference on Computational Linguistics (COLING), pages 521–528, Manchester, UK, August 2008. Coling 2008 Organizing Committee. URL https://aclanthology.org/C08-1066.
  58. Deberta: Decoding-enhanced bert with disentangled attention. In International Conference on Learning Representations (ICLR), 2021.
  59. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
  60. TRUE: Re-evaluating factual consistency evaluation. In North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 3905–3920, Seattle, United States, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.287. URL https://aclanthology.org/2022.naacl-main.287.
  61. A large annotated corpus for learning natural language inference. In Annual Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015.
  62. Abstract visual reasoning with tangram shapes. In Annual Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 582–601, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.38.
  63. ALBERT: A lite BERT for self-supervised learning of language representations. 2020a.
  64. Albert: A lite bert for self-supervised learning of language representations. In International Conference on Learning Representations (ICLR), 2020b.
  65. Rishabh Misra. News headlines dataset for sarcasm detection, 2022.
  66. Drum up SUPPORT: Systematic analysis of image-schematic conceptual metaphors. In Proceedings of the 3rd Workshop on Figurative Language Processing (FLP), pages 44–53, Abu Dhabi, United Arab Emirates (Hybrid), December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.flp-1.7.
  67. Marti A. Hearst. Texttiling: Segmenting text into multi-paragraph subtopic passages. Comput. Linguist., 23(1):33–64, mar 1997. ISSN 0891-2017.
  68. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
  69. Language models are unsupervised multitask learners. 2019.
Citations (6)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.