Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Finding Pragmatic Differences Between Disciplines (2310.00204v1)

Published 30 Sep 2023 in cs.CL

Abstract: Scholarly documents have a great degree of variation, both in terms of content (semantics) and structure (pragmatics). Prior work in scholarly document understanding emphasizes semantics through document summarization and corpus topic modeling but tends to omit pragmatics such as document organization and flow. Using a corpus of scholarly documents across 19 disciplines and state-of-the-art LLMing techniques, we learn a fixed set of domain-agnostic descriptors for document sections and "retrofit" the corpus to these descriptors (also referred to as "normalization"). Then, we analyze the position and ordering of these descriptors across documents to understand the relationship between discipline and structure. We report within-discipline structural archetypes, variability, and between-discipline comparisons, supporting the hypothesis that scholarly communities, despite their size, diversity, and breadth, share similar avenues for expressing their work. Our findings lay the foundation for future work in assessing research quality, domain style transfer, and further pragmatic analysis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. Patterns of argumentation strategies across topics. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1351–1357, Copenhagen, Denmark. Association for Computational Linguistics.
  2. SECTOR: A neural model for coherent topic segmentation and classification. Transactions of the Association for Computational Linguistics, 7:169–184.
  3. Writing strategies for science communication: Data and computational analysis. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5327–5344, Online. Association for Computational Linguistics.
  4. SciBERT: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, Hong Kong, China. Association for Computational Linguistics.
  5. Overview and insights from the shared tasks at scholarly document processing 2020: CL-SciSumm, LaySumm and LongSumm. In Proceedings of the First Workshop on Scholarly Document Processing, pages 214–224, Online. Association for Computational Linguistics.
  6. Elly Ifantidou. 2005. The semantics and pragmatics of metadiscourse. Journal of Pragmatics, 37(9):1325–1353. Focus-on Issue: Discourse and Metadiscourse.
  7. Kenji Sagae Justin Garten, Brendan Kennedy and Morteza Deghani. 2019. Measuring the importance of context when modeling language comprehension. Behavioral Research Methods, 51:480–492.
  8. John Lawrence and Chris Reed. 2020. Argument Mining: A Survey. Computational Linguistics, 45(4):765–818.
  9. S2ORC: The semantic scholar open research corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4969–4983, Online. Association for Computational Linguistics.
  10. Structure-tags improve text classification for scholarly document quality prediction. In Proceedings of the First Workshop on Scholarly Document Processing, pages 158–167, Online. Association for Computational Linguistics.
  11. William Mann and Sandra Thompson. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text - Interdisciplinary Journal for the Study of Discourse, 8:243–281.
  12. Diarmuid Ó Séaghdha and Simone Teufel. 2014. Unsupervised learning of rhetorical structure with un-topic models. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 2–13, Dublin, Ireland. Dublin City University and Association for Computational Linguistics.
  13. Michael Paul and Roxana Girju. 2009. Topic modeling of research fields: An interdisciplinary perspective. In Proceedings of the International Conference RANLP-2009, pages 337–342, Borovets, Bulgaria. Association for Computational Linguistics.
  14. Predicting the rise and fall of scientific topics from trends in their rhetorical framing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1170–1180, Berlin, Germany. Association for Computational Linguistics.
  15. Automated fact-checking of claims from Wikipedia. In Proceedings of the 12th Language Resources and Evaluation Conference, pages 6874–6882, Marseille, France. European Language Resources Association.
  16. OCR++: A robust framework for information extraction from scholarly articles. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3390–3400, Osaka, Japan. The COLING 2016 Organizing Committee.
  17. Meta fine-tuning neural language models for multi-domain text mining. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3094–3104, Online. Association for Computational Linguistics.

Summary

We haven't generated a summary for this paper yet.