ConvoSense: Overcoming Monotonous Commonsense Inferences for Conversational AI (2401.15471v1)
Abstract: Mastering commonsense understanding and reasoning is a pivotal skill essential for conducting engaging conversations. While there have been several attempts to create datasets that facilitate commonsense inferences in dialogue contexts, existing datasets tend to lack in-depth details, restate information already present in the conversation, and often fail to capture the multifaceted nature of commonsense reasoning. In response to these limitations, we compile a new synthetic dataset for commonsense reasoning in dialogue contexts using GPT, ConvoSense, that boasts greater contextual novelty, offers a higher volume of inferences per example, and substantially enriches the detail conveyed by the inferences. Our dataset contains over 500,000 inferences across 12,000 dialogues with 10 popular inference types, which empowers the training of generative commonsense models for dialogue that are superior in producing plausible inferences with high novelty when compared to models trained on the previous datasets. To the best of our knowledge, ConvoSense is the first of its kind to provide such a multitude of novel inferences at such a large scale.
- Amit Bagga and Breck Baldwin. 1998. Algorithms for scoring coreference chains. In Proceedings of the Linguistic Coreference Workshop at the 1st Conference on Language Resources and Evaluation, pages 563–566.
- Rainer E. Burkard and Eranda Cela. 1999. Linear assignment problems and extensions. In Handbook of combinatorial optimization: Supplement volume A, pages 75–149. Springer.
- Yue Cao and Xiaojun Wan. 2020. DivGAN: Towards diverse paraphrase generation via diversified generative adversarial network. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2411–2421.
- Herbert H. Clark and Susan E. Brennan. 1991. Grounding in communication. In Perspectives on Socially Shared Cognition, pages 127–149. American Psychological Association.
- MultiTalk: A highly-branching dialog testbed for diverse conversations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12760–12767.
- Sarah Fillwock and David Traum. 2018. Identification of personal information shared in chat-oriented dialogue. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
- Don’t forget your ABC’s: Evaluating the state-of-the-art in chat-oriented dialogue systems. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15044–15071, Toronto, Canada. Association for Computational Linguistics.
- Paragraph-level commonsense transformers with recurrent memory. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 12857–12865.
- ComFact: A benchmark for linking contextual commonsense knowledge. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1656–1675, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- CIDER: Commonsense inference for dialogue explanation and reasoning. In Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 301–313.
- CICERO: A dataset for contextualized commonsense inference in dialogues. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5010–5028, Dublin, Ireland. Association for Computational Linguistics.
- A systematic exploration of diversity in machine translation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1100–1111.
- Maarten Grootendorst. 2022. Bertopic: Neural topic modeling with a class-based tf-idf procedure. arXiv preprint arXiv:2203.05794v1.
- A knowledge-enhanced pretraining model for commonsense story generation. Transactions of the Association for Computational Linguistics, 8:93–108.
- Kilem Li Gwet. 2002. Kappa statistic is not satisfactory for assessing the extent of agreement between raters. Statistical Methods for Inter-Rater Reliability Assessment, 1.
- (Comet-) Atomic 2020: On symbolic and neural commonsense knowledge graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 6384–6392.
- Comparison of diverse decoding methods from conditional language models. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3752–3762.
- Facing imbalanced data–recommendations for the use of performance metrics. In 2013 Humaine association conference on affective computing and intelligent interaction, pages 245–251. IEEE.
- Improving bot response contradiction detection via utterance rewriting. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 605–614.
- SODA: Million-scale dialogue distillation with social commonsense contextualization. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12930–12949, Singapore. Association for Computational Linguistics.
- Mind the gap! Injecting commonsense knowledge for abstractive dialogue summarization. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6285–6300, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Knowledge bridging for empathetic dialogue generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 10993–11001.
- hdbscan: Hierarchical density based clustering. The Journal of Open Source Software, 2(11):205.
- UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426v3.
- What information should a dialogue system understand?: Collection and analysis of perceived information in chat-oriented dialogue. In Advanced Social Interaction with Agents: 8th International Workshop on Spoken Dialog Systems, pages 27–36. Springer.
- Improving open-domain dialogue systems via multi-turn incomplete utterance restoration. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1824–1833.
- BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- David Quarfoot and Richard A. Levine. 2016. How robust are multirater interrater reliability indices to changes in frequency distribution? The American Statistician, 70(4):373–384.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992.
- CEM: Commonsense-aware empathetic response generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11229–11237.
- ATOMIC: An atlas of machine commonsense for if-then reasoning. In Proceedings of the AAAI conference on artificial intelligence, pages 3027–3035.
- Multiview contextual commonsense inference: A new dataset and task. arXiv preprint arXiv:2210.02890v2.
- ConceptNet 5.5: An open multilingual graph of general knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31.
- Diverse beam search: Decoding diverse solutions from neural sequence models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
- Symbolic knowledge distillation: From general language models to commonsense models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4602–4625, Seattle, United States. Association for Computational Linguistics.
- A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples. BMC Medical Research Methodology, 13:1–7.
- SocialDial: A benchmark for socially-aware dialogue systems. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval.
- BERTScore: Evaluating text generation with BERT. In International Conference on Learning Representations.
- Reflect not reflex: Inference-based common ground improves dialogue response quality. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.
- Think before you speak: Explicitly generating implicit commonsense knowledge for response generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1237–1252.
- Probing commonsense explanation in dialogue response generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4132–4146.
- Sarah E. Finch (10 papers)
- Jinho D. Choi (67 papers)