Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

eRST: A Signaled Graph Theory of Discourse Relations and Organization (2403.13560v1)

Published 20 Mar 2024 in cs.CL

Abstract: In this article we present Enhanced Rhetorical Structure Theory (eRST), a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST). The framework encompasses discourse relation graphs with tree-breaking, nonprojective and concurrent relations, as well as implicit and explicit signals which give explainable rationales to our analyses. We survey shortcomings of RST and other existing frameworks, such as Segmented Discourse Representation Theory (SDRT), the Penn Discourse Treebank (PDTB) and Discourse Dependencies, and address these using constructs in the proposed theory. We provide annotation, search and visualization tools for data, and present and evaluate a freely available corpus of English annotated according to our framework, encompassing 12 spoken and written genres with over 200K tokens. Finally, we discuss automatic parsing, evaluation metrics and applications for data in our framework.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (103)
  1. Learning recursive segments for discourse parsing. In Proceedings of LREC 2010, pages 3578–3584, Valletta, Malta.
  2. Anuranjana, Kaveri. 2023. DiscoFlan: Instruction fine-tuning and refined text generation for discourse relation label classification. In Proceedings of the 3rd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2023), pages 22–28, The Association for Computational Linguistics, Toronto, Canada.
  3. Discourse structure and dialogue acts in multiparty dialogue: the STAC corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 2721–2727, European Language Resources Association (ELRA), Portorož, Slovenia.
  4. Asher, Nicholas and Alex Lascarides. 2003. Logics of Conversation. Studies in Natural Language Processing. Cambridge University Press, Cambridge.
  5. Asher, Nicholas and Laure Vieu. 2005. Subordinating and coordinating discourse relations. Lingua, 115:591–610.
  6. Biber, Douglas and Bethany Gray. 2010. Challenging stereotypes about academic writing: Complexity, elaboration, explicitness. Journal of English for Academic Purposes, 9(1):2–20.
  7. A procedure for quantitatively comparing the syntactic coverage of English grammars. In Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991.
  8. Bourgonje, Peter and Manfred Stede. 2020. The Potsdam commentary corpus 2.2: Extending annotations for shallow discourse parsing. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1061–1066, European Language Resources Association, Marseille, France.
  9. The TIGER Treebank. In Proceedings of the Workshop on Treebanks and Linguistic Theories, September 20-21 (TLT02), pages 24–42, Sozopol, Bulgaria.
  10. The DISRPT 2023 shared task on elementary discourse unit segmentation, connective detection, and relation classification. In Proceedings of the 3rd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2023), pages 1–21, The Association for Computational Linguistics, Toronto, Canada.
  11. Multi-view and multi-task training of RST discourse parsers. In Proceedings of COLING 2016, pages 1903–1913, Osaka.
  12. Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. In Proceedings of 2nd SIGDIAL Workshop on Discourse and Dialogue, Eurospeech 2001, pages 1–10, Aalborg, Denmark.
  13. DiscoPrompt: Path prediction prompt tuning for implicit discourse relation recognition. In Findings of the Association for Computational Linguistics: ACL 2023, pages 35–57, Association for Computational Linguistics, Toronto, Canada.
  14. Cheng, Yi and Sujian Li. 2019. Zero-shot Chinese discourse dependency parsing via cross-lingual mapping. In Proceedings of the 1st Workshop on Discourse Structure in Neural NLG, pages 24–29, Association for Computational Linguistics, Tokyo, Japan.
  15. Crible, Ludivine. 2022. The syntax and semantics of coherence relations: From relative configurations to predictive signals. International Journal of Corpus Linguistics, 27(1):59–92.
  16. Dai, Zeyu and Ruihong Huang. 2018. Improving implicit discourse relation classification by modeling inter-dependencies of discourse units in a paragraph. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 141–151, Association for Computational Linguistics, New Orleans, Louisiana.
  17. Das, Debopam and Maite Taboada. 2014. RST Signalling Corpus Annotation Manual. Unpublished manuscript.
  18. Das, Debopam and Maite Taboada. 2017. Signalling of coherence relations in discourse, beyond discourse markers. Discourse Processes, 55(8):743–770.
  19. Das, Debopam and Maite Taboada. 2018. RST Signalling Corpus: a corpus of signals of coherence relations. Language Resources and Evaluation, 52(1):149–184.
  20. How compatible are our discourse annotation frameworks? insights from mapping RST-DT and PDTB annotations. Dialogue & Discourse, 10(1):87–135.
  21. Feng, Vanessa Wei and Graeme Hirst. 2014. A linear-time bottom-up discourse parser with constraints and post-editing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 511–521, Association for Computational Linguistics, Baltimore, Maryland.
  22. DisCoDisCo at the DISRPT2021 shared task: A system for discourse segmentation, classification, and connective detection. In Proceedings of Discourse Relation Parsing and Treebanking 2021 (DISRPT 2021), pages 51–62, Punta Cana, Dominican Republic.
  23. A discourse signal annotation system for RST trees. In Proceedings of Discourse Relation Treebanking and Parsing (DISRPT 2019), pages 56–61, Minneapolis, MN.
  24. AMALGUM – a free, balanced, multilayer English web corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5267–5275, European Language Resources Association, Marseille, France.
  25. Guz, Grigorii and Giuseppe Carenini. 2020. Coreference for discourse parsing: A neural approach. In Proceedings of the First Workshop on Computational Approaches to Discourse, pages 160–167, Association for Computational Linguistics, Online.
  26. Unleashing the power of neural discourse parsers - a context and structure aware approach using large scale pretraining. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3794–3805, International Committee on Computational Linguistics, Barcelona, Spain (Online).
  27. HILDA: A discourse parser using support vector machine classification. Dialogue and Discourse, 1(3):1–33.
  28. Single-document summarization as a tree knapsack problem. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1515–1520, Association for Computational Linguistics, Seattle, Washington, USA.
  29. Using the Cognitive approach to Coherence Relations for discourse annotation. Dialogue and Discourse, 10(2):1–33.
  30. Hovy, Eduard H. 1990. Parsimonious and profligate approaches to the question of discourse structure relations. In Proceedings of the Fifth International Workshop on Natural Language Generation, Association for Computational Linguistics, Linden Hall Conference Center, Dawson, Pennsylvania.
  31. Hughes, Rebecca. 1996. English in Speech and Writing. Routledge, London.
  32. Jurafsky, Daniel and James H. Martin. 2023. Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 3rd edition edition. Online, Upper Saddle River, NJ.
  33. Discourse representation theory. Handbook of Philosophical Logic: Volume 15, pages 125–394.
  34. Implicit discourse relation classification: We need to talk about evaluation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5404–5414, Association for Computational Linguistics, Online.
  35. Comprehension Effects of Connectives Across Texts, Readers, and Coherence Relations. Discourse Processes, 56(5-6):447–464.
  36. Knaebel, René. 2021. discopy: A neural system for shallow discourse parsing. In Proceedings of the 2nd Workshop on Computational Approaches to Discourse, pages 128–133, Association for Computational Linguistics, Punta Cana, Dominican Republic and Online.
  37. Top-Down RST Parsing Utilizing Granularity Levels in Documents. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):8099–8106.
  38. A simple and strong baseline for end-to-end neural RST-style discourse parsing. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6725–6737, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
  39. graphANNIS: A fast query engine for deeply annotated linguistic corpora. Journal for Language Technology and Computational Linguistics, 31(1):1–25.
  40. Krause, Thomas and Amir Zeldes. 2016. ANNIS3: A new architecture for generic corpus query and visualization. Digital Scholarship in the Humanities, 31(1):118–139.
  41. Kurfalı, Murathan and Robert Östling. 2021. Probing multilingual language models for discourse. In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021), pages 8–19, Association for Computational Linguistics, Online.
  42. Lascarides, Alex and Nicholas Asher. 2007. Segmented Discourse Representation Theory: Dynamic semantics with discourse structure. In Computing Meaning, Studies in Linguistics and Philosophy 3. Springer, Dordrecht, pages 87–124.
  43. Molweni: A challenge multiparty dialogues-based machine reading comprehension dataset with discourse structure. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2642–2652, International Committee on Computational Linguistics, Barcelona, Spain (Online).
  44. Text-level discourse dependency parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 25–35, Association for Computational Linguistics, Baltimore, Maryland.
  45. HITS at DISRPT 2023: Discourse segmentation, connective detection, and relation classification. In Proceedings of the 3rd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2023), pages 43–49, The Association for Computational Linguistics, Toronto, Canada.
  46. Liu, Yang. 2019. Beyond the Wall Street Journal: Anchoring and comparing discourse signals across genres. In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, pages 72–81, Association for Computational Linguistics, Minneapolis, MN.
  47. Liu, Yang and Amir Zeldes. 2019. Discourse relations and signaling information: Anchoring discourse signals in RST-DT. Proceedings of the Society for Computation in Linguistics, 2(35):314–317.
  48. What’s hard in English RST parsing? predictive models for error analysis. In Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue, pages 31–42, Association for Computational Linguistics, Prague, Czechia.
  49. DMRST: A joint framework for document-level multilingual RST discourse segmentation and parsing. In Proceedings of the 2nd Workshop on Computational Approaches to Discourse, pages 154–164, Association for Computational Linguistics, Punta Cana, Dominican Republic and Online.
  50. Mann, William C. and Sandra A. Thompson. 1988. Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8(3):243–281.
  51. Marcu, Daniel. 1997. The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts. Phd thesis, University of Toronto.
  52. Building a large annotated corpus of English: The Penn Treebank. Special Issue on Using Large Corpora, Computational Linguistics, 19(2):313–330.
  53. Universal Dependencies. Computational Linguistics, 47(2):255–308.
  54. Mendes, Amália and Pierre Lejeune. 2022. Crpc-db a discourse bank for portuguese. In Computational Processing of the Portuguese Language: 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21–23, 2022, Proceedings, page 79–89, Springer-Verlag, Berlin, Heidelberg.
  55. DisCut and DiscReT: MELODI at DISRPT 2023. In Proceedings of the 3rd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2023), pages 29–42, The Association for Computational Linguistics, Toronto, Canada.
  56. Quality and efficiency of manual annotation: Pre-annotation bias. In Proceedings of the Language Resources and Evaluation Conference, pages 2909–2918, European Language Resources Association, Marseille, France.
  57. Moore, Johanna D. and Cecile L. Paris. 1993. Planning text for advisory dialogues: Capturing intentional and rhetorical information. Computational Linguistics, 19(4):651–694.
  58. Moore, Johanna D. and Martha E. Pollack. 1992. A problem for RST: The need for multi-level discourse analysis. Computational Linguistics, 18(4):537–544.
  59. How much progress have we made on RST discourse parsing? a replication study of recent results on the RST-DT. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1319–1324, Association for Computational Linguistics, Copenhagen, Denmark.
  60. A dependency perspective on RST discourse parsing and evaluation. Computational Linguistics, 44(2):197–235.
  61. Nishida, Noriki and Yuji Matsumoto. 2022. Out-of-domain discourse dependency parsing via bootstrapping: An empirical analysis on its effectiveness and limitation. Transactions of the Association for Computational Linguistics, 10:127–144.
  62. Universal Dependencies v2: An evergrowing multilingual treebank collection. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4034–4043, European Language Resources Association, Marseille, France.
  63. Peng, Siyao. 2023. Cross-Paragraph Discourse Structure in Rhetorical Structure Theory Parsing and Treebanking for Chinese and English. Ph.D. thesis, Georgetown University.
  64. GCDT: A Chinese RST treebank for multigenre and multilingual discourse parsing. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 382–391, Association for Computational Linguistics, Online only.
  65. Porter, Martin F. 2001. Snowball: A language for stemming algorithms. Online.
  66. Potter, Andrew. 2019. The rhetorical structure of attribution. In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, pages 38–49, Association for Computational Linguistics, Minneapolis, MN.
  67. The Penn Discourse TreeBank 1.0 annotation manual. Technical report, University of Pennsylvania, PDTB Research Group.
  68. Reflections on the Penn Discourse TreeBank, comparable corpora, and complementary annotation. Computational Linguistics, 40(4):921–950.
  69. Adversarial connective-exploiting networks for implicit discourse relation classification. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1006–1017, Association for Computational Linguistics, Vancouver, Canada.
  70. Annotating discourse relations in spoken language: A comparison of the PDTB and CCR frameworks. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 1039–1046, European Language Resources Association (ELRA), Portorož, Slovenia.
  71. A systematic study of neural discourse models for implicit discourse relation. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 281–291, Association for Computational Linguistics, Valencia, Spain.
  72. Towards a taxonomy of coherence relations. Discourse Processes, 15:1–35.
  73. Unifying dimensions in coherence relations: How various annotation frameworks are related. Corpus Linguistics and Linguistic Theory, 17(1):1–71.
  74. Linguistic markers of coherence improve text comprehension in functional contexts. Information Design Journal, 15(3):219–235.
  75. Comparison of methods for explicit discourse connective identification across various domains. In Proceedings of the 2nd Workshop on Computational Approaches to Discourse, pages 95–106, Association for Computational Linguistics, Punta Cana, Dominican Republic and Online.
  76. Easy-first bottom-up discourse parsing via sequence labelling. In Proceedings of the 3rd Workshop on Computational Approaches to Discourse, pages 35–41, International Conference on Computational Linguistics, Gyeongju, Republic of Korea and Online.
  77. ConceptNet 5.5: An open multilingual graph of general knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, page 4444–4451, AAAI Press.
  78. Stede, Manfred. 2008. RST revisited: Disentangling nuclearity. In Cathrine Fabricius-Hansen and Wiebke Ramm, editors, ’Subordination’ versus ’Coordination’ in Sentence and Text: A cross-linguistic perspective, Studies in Language Companion Series 98. John Benjamins, Amsterdam, pages 33–58.
  79. Stede, Manfred. 2012. Discourse Processing. Synthesis Lectures on Human Language Technologies 4. Morgan & Claypool, [San Rafael, CA].
  80. Parallel discourse annotations on a corpus of short texts. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 1051–1058, European Language Resources Association (ELRA), Portorož, Slovenia.
  81. Stede, Manfred and Arne Neumann. 2014. Potsdam Commentary Corpus 2.0: Annotation for discourse research. In Proceedings of the Language Resources and Evaluation Conference (LREC ’14), pages 925–929, Reykjavik.
  82. Sun, Kun and Rong Wang. 2022. Constructing the corpus of Chinese textual ’run-on’ sentences (CCTRS): Discourse corpus benchmark with multi-layer annotations. In International Conference on Natural Language and Speech Processing (ICNLSP)-2022, Trento, Italy.
  83. Taboada, Maite and Julia Lavid. 2003. Rhetorical and thematic patterns in scheduling dialogues: A generic characterization. Functions of Language, 10(2):147–179.
  84. Taboada, Maite and William C. Mann. 2006. Rhetorical Structure Theory: Looking back and moving ahead. Discourse Studies, 8:423–459.
  85. Teufel, Simone and Marc Moens. 2002. Summarising scientific articles - experiments with relevance and rhetorical status. Computational Linguistics, 28(4):409–445.
  86. Annotation of discourse relations for conversational spoken dialogs. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), European Language Resources Association (ELRA), Valletta, Malta.
  87. Diachronic changes in subjectivity and stance–a corpus linguistic study of dutch news texts. Discourse, Context & Media, 1(2):95–102. The view from here, there and nowhere: discursive approaches to journalistic stance.
  88. Webber, Bonnie. 2013. What excludes an alternative in coherence relations? In Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers, pages 276–287, Association for Computational Linguistics, Potsdam, Germany.
  89. Widlöcher, Antoine and Yann Mathet. 2012. The glozz platform: A corpus annotation and mining tool. In Proceedings of the 2012 ACM symposium on document engineering, pages 171–180.
  90. CoNLL 2016 shared task on multilingual shallow discourse parsing. In Proceedings of the CoNLL-16 shared task, pages 1–19, Association for Computational Linguistics, Berlin, Germany.
  91. Yang, An and Sujian Li. 2018. SciDTB: Discourse dependency TreeBank for scientific abstracts. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 444–449, Association for Computational Linguistics, Melbourne, Australia.
  92. Unifying discourse resources with dependency framework. In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 1058–1065, Chinese Information Processing Society of China, Huhhot, China.
  93. RST discourse parsing with second-stage EDU-level pre-training. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4269–4280, Association for Computational Linguistics, Dublin, Ireland.
  94. GumDrop at the DISRPT2019 shared task: A model stacking approach to discourse unit segmentation and connective detection. In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, pages 133–143, Association for Computational Linguistics, Minneapolis, MN.
  95. Zeldes, Amir. 2016. rstWeb - a browser-based annotation interface for Rhetorical Structure Theory and discourse relations. In Proceedings of NAACL-HLT 2016 System Demonstrations, pages 1–5, San Diego, CA.
  96. Zeldes, Amir. 2017. The GUM corpus: Creating multilayer resources in the classroom. Language Resources and Evaluation, 51(3):581–612.
  97. Zeldes, Amir. 2022. Can we fix the scope for coreference? problems and solutions for benchmarks beyond OntoNotes. Dialogue & Discourse, 13(1):41–62.
  98. The DISRPT 2021 shared task on elementary discourse unit segmentation, connective detection, and relation classification. In Proceedings of Discourse Relation Parsing and Treebanking 2021 (DISRPT 2021), pages 1–12, Punta Cana, Dominican Republic.
  99. Zeyrek, Deniz and Murathan Kurfalı. 2017. TDB 1.1: Extensions on Turkish discourse bank. In Proceedings of the 11th Linguistic Annotation Workshop, pages 76–81, Association for Computational Linguistics, Valencia, Spain.
  100. Ted multilingual discourse bank (ted-mdb): a parallel corpus annotated in the pdtb style. Language Resources and Evaluation, pages 1–38.
  101. Multilingual extension of PDTB-style annotation: The case of TED multilingual discourse bank. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European Language Resources Association (ELRA), Miyazaki, Japan.
  102. Adversarial learning for discourse rhetorical structure parsing. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3946–3957, Association for Computational Linguistics, Online.
  103. Chinese Discourse Treebank 0.5 LDC2014T21.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Amir Zeldes (41 papers)
  2. Tatsuya Aoyama (13 papers)
  3. Yang Janet Liu (13 papers)
  4. Siyao Peng (27 papers)
  5. Debopam Das (8 papers)
  6. Luke Gessler (13 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets