Papers
Topics
Authors
Recent
Search
2000 character limit reached

Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue: An Empirical Study

Published 15 May 2023 in cs.CL | (2305.08391v2)

Abstract: LLMs, like ChatGPT, have shown remarkable capability in many downstream tasks, yet their ability to understand discourse structures of dialogues remains less explored, where it requires higher level capabilities of understanding and reasoning. In this paper, we aim to systematically inspect ChatGPT's performance in two discourse analysis tasks: topic segmentation and discourse parsing, focusing on its deep semantic understanding of linear and hierarchical discourse structures underlying dialogue. To instruct ChatGPT to complete these tasks, we initially craft a prompt template consisting of the task description, output format, and structured input. Then, we conduct experiments on four popular topic segmentation datasets and two discourse parsing datasets. The experimental results showcase that ChatGPT demonstrates proficiency in identifying topic structures in general-domain conversations yet struggles considerably in specific-domain conversations. We also found that ChatGPT hardly understands rhetorical structures that are more complex than topic structures. Our deeper investigation indicates that ChatGPT can give more reasonable topic structures than human annotations but only linearly parses the hierarchical rhetorical structures. In addition, we delve into the impact of in-context learning (e.g., chain-of-thought) on ChatGPT and conduct the ablation study on various prompt components, which can provide a research foundation for future work. The code is available at \url{https://github.com/yxfanSuda/GPTforDDA}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Discourse Structure and Dialogue Acts in Multiparty Dialogue: the STAC Corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation, pages 2721–2727.
  2. Nicholas Asher and Alex Lascarides. 2003. Logics of conversation. Cambridge University Press.
  3. Statistical Models for Text Segmentation. Machine learning, 34:177–210.
  4. MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5016–5026.
  5. ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations. arXiv preprint arXiv: 2304.14827.
  6. Ta-Chung Chi and Alexander Rudnicky. 2022. Structured Dialogue Discourse Parsing. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 325–335.
  7. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4171–4186.
  8. Key-Value Retrieval Networks for Task-Oriented Dialogue. In Proceedings of the 18th Annual SIGDIAL Meeting on Discourse and Dialogue, pages 37–49.
  9. Improving dialogue discourse parsing via reply-to structures of addressee recognition. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8484–8495.
  10. A Distance-Aware Multi-Task Framework for Conversational Discourse Parsing. In Proceedings of the 29th International Conference on Computational Linguistics, pages 912–921.
  11. Unsupervised Dialogue Topic Segmentation with Topic-aware Contrastive Learning. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2481–2485.
  12. Exploring the Feasibility of ChatGPT for Event Extraction. arXiv preprint arXiv: 2303.03836.
  13. Is Information Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors. arXiv preprint arXiv: 2305.14450.
  14. Multi-tasking Dialogue Comprehension with Discourse Parsing. In Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation, pages 598–608.
  15. Marti A Hearst. 1997. TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages. Computational linguistics, 23(1):33–64.
  16. Zero-shot Clinical Entity Recognition using ChatGPT. arXiv preprint arXiv: 2303.16416.
  17. Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine. arXiv preprint arXiv: 2301.08745.
  18. Molweni: A Challenge Multiparty Dialogues-based Machine Reading Comprehension Dataset with Discourse Structure. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2642–2652.
  19. DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset. In Proceedings of the Eighth International Joint Conference on Natural Language Processing, pages 986–995.
  20. Enhancing dialogue summarization with topic-aware global- and local- level centrality. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 27–38.
  21. Multi-Granularity Prompts for Topic Shift Detection in Dialogue. In Proceedings of the 2023 Nineteenth International Conference on Intelligent Computing, pages 511–522.
  22. Topic Shift Detection in Chinese Dialogues: Corpus and Benchmark. In Proceedings of the 17th International Conference on Document Analysis and Recognition, pages 166–183.
  23. Zhengyuan Liu and Nancy Chen. 2021. Improving Multi-Party Dialogue Discourse Parsing via Domain Integration. In Proceedings of the 2nd Workshop on Computational Approaches to Discourse, pages 122–127.
  24. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 285–294.
  25. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048–11064.
  26. Capabilities of GPT-4 on Medical Challenge Problems. arXiv preprint arXiv: 2303.13375.
  27. Lev Pevzner and Marti A. Hearst. 2002. A Critique and Improvement of an Evaluation Metric for Text Segmentation. Computational Linguistics, 28(1):19–36.
  28. Dongqi Pu and Vera Demberg. 2023. ChatGPT vs Human-authored Text: Insights into Controllable Text Summarization and Sentence Style Transfer. arXiv preprint arXiv: 2306.07799.
  29. Is ChatGPT a General-Purpose Natural Language Processing Task Solver? arXiv preprint arXiv: 2302.06476.
  30. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv preprint arXiv: 1910.10683.
  31. Leveraging Large Language Models for Multiple Choice Question Answering. arXiv preprint arXiv: 2210.12353.
  32. Zhouxing Shi and Minlie Huang. 2019. A Deep Sequential Model for Discourse Parsing on Multi-Party Dialogues. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 7007–7014.
  33. Unsupervised Topic Segmentation of Meetings with BERT Embeddings. arXiv preprint arXiv: 2106.12978.
  34. Dialogue session segmentation by embedding-enhanced texttiling. arXiv preprint arXiv: 1610.03955.
  35. Teo Susnjak. 2023. Applying BERT and ChatGPT for Sentiment Analysis of Lyme Disease in Scientific Literature. arXiv preprint arXiv: 2302.06474.
  36. A Structure Self-Aware Model for Discourse Parsing on Multi-Party Dialogues. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, pages 3943–3949.
  37. Zero-Shot Cross-Lingual Summarization via Large Language Models. arXiv preprint arXiv: 2302.14229.
  38. Document-Level Machine Translation with Large Language Models. arXiv preprint arXiv: 2304.02210.
  39. NaturalConv: A Chinese Dialogue Dataset Towards Multi-turn Topic-driven Conversation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14006–14014.
  40. Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study. arXiv preprint arXiv: 2304.04339.
  41. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the 36th Conference on Neural Information Processing Systems, pages 1–14.
  42. Zero-Shot Information Extraction via Chatting with ChatGPT. arXiv preprint arXiv: 2302.10205.
  43. TIAGE: A Benchmark for Topic-Shift Aware Dialog Modeling. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 1684–1690.
  44. Linzi Xing and Giuseppe Carenini. 2021. Improving Unsupervised Dialogue Topic Segmentation with Utterance-Pair Coherence Scoring. In Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 167–177.
  45. Improving Context Modeling in Neural Topic Segmentation. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 626–636.
  46. Topic-aware Multi-turn Dialogue Modeling. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14176–14184.
  47. TAKE: Topic-shift Aware Knowledge sElection for Dialogue Generation. In Proceedings of the 29th International Conference on Computational Linguistics, pages 253–265.
  48. A Joint Model for Dropped Pronoun Recovery and Conversational Discourse Parsiin Chinese Conversational Speech. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 1752–1763.
  49. Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization. arXiv preprint arXiv: 2302.08081.
  50. Speaker-Aware Discourse Parsing on Multi-Party Dialogues. In Proceedings of the 29th International Conference on Computational Linguistics, pages 5372–5382.
  51. Zero-shot Temporal Relation Extraction with ChatGPT. arXiv preprint arXiv: 2304.05454.
  52. Extractive Summarization via ChatGPT for Faithful Summary Generation. arXiv preprint arXiv: 2304.04193.
  53. Personalizing Dialogue Agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pages 2204–2213.
  54. Benchmarking Large Language Models for News Summarization. arXiv preprint arXiv: 2301.13848.
Citations (16)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.