Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence (2312.16893v1)

Published 28 Dec 2023 in cs.CL and cs.AI

Abstract: Measuring the coherence of text is a vital aspect of evaluating the quality of written content. Recent advancements in neural coherence modeling have demonstrated their efficacy in capturing entity coreference and discourse relations, thereby enhancing coherence evaluation. However, many existing methods heavily depend on static embeddings or focus narrowly on nearby context, constraining their capacity to measure the overarching coherence of long texts. In this paper, we posit that coherent texts inherently manifest a sequential and cohesive interplay among sentences, effectively conveying the central theme, purpose, or standpoint. To explore this abstract relationship, we introduce the "BBScore," a novel reference-free metric grounded in Brownian bridge theory for assessing text coherence. Our findings showcase that when synergized with a simple additional classification component, this metric attains a performance level comparable to state-of-the-art techniques on standard artificial discrimination tasks. We also establish in downstream tasks that this metric effectively differentiates between human-written documents and text generated by LLMs under a specific domain. Furthermore, we illustrate the efficacy of this approach in detecting written styles attributed to diverse LLMs, underscoring its potential for generalizability. In summary, we present a novel Brownian bridge coherence metric capable of measuring both local and global text coherence, while circumventing the need for end-to-end model training. This flexibility allows for its application in various downstream tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. RSTGen: Imbuing Fine-Grained Interpretable Control into Long-FormText Generators. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1822–1835. Seattle, United States: Association for Computational Linguistics.
  2. Interpreting discourse: coherence and the analysis of ethnographic interview. Discourse Processes, 5: 1–32.
  3. SECTOR: A Neural Model for Coherent Topic Segmentation and Classification. Transactions of the Association for Computational Linguistics, 7: 169–184.
  4. Modeling Local Coherence: An Entity-Based Approach. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), 141–148. Ann Arbor, Michigan: Association for Computational Linguistics.
  5. Modeling Local Coherence: An Entity-Based Approach. Computational Linguistics, 34(1): 1–34.
  6. Generating Sentences from a Continuous Space. arXiv:1511.06349.
  7. Chow, W. C. 2009. Brownian bridge. Wiley interdisciplinary reviews: computational statistics, 1(3): 325–332.
  8. Model Criticism for Long-Form Text Generation. arXiv:2210.08444.
  9. Extending the Entity Grid with Entity-Specific Features. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 125–129. Portland, Oregon, USA: Association for Computational Linguistics.
  10. SimCSE: Simple Contrastive Learning of Sentence Embeddings. arXiv:2104.08821.
  11. A comparison of changes in macrolinguistic and microlinguistic aspects of discourse production in normal aging. Journal of Gerontology: Psychological Sciences, 47: 266–272.
  12. Centering: A Framework for Modeling the Local Coherence of Discourse. Computational Linguistics, 21(2): 203–225.
  13. Attention, Intentions, and the Structure of Discourse. Computational Linguistics, 12(3): 175–204.
  14. Graph-based Local Coherence Modeling. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 93–103. Sofia, Bulgaria: Association for Computational Linguistics.
  15. Hearst, M. A. 1997. Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages. Computational Linguistics, 23(1): 33–64.
  16. Analyzing animal movements using Brownian bridges. Ecology, 88(9): 2354–2363.
  17. Entity-based Neural Local Coherence Modeling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 7787–7805. Dublin, Ireland: Association for Computational Linguistics.
  18. Coherence Modeling of Asynchronous Conversations: A Neural Entity Grid Approach. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 558–568. Melbourne, Australia: Association for Computational Linguistics.
  19. Kehler, A. 2022. Coherence Establishment as a Source of Explanation in Linguistic Theory. Annual Review of Linguistics, 8: 123–142.
  20. Toward a model of text comprehension and production. Psychological Review, 85: 363–394.
  21. Krumm, J. 2021. Brownian Bridge Interpolation for Human Mobility? In Proceedings of the 29th International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’21, 175–183. New York, NY, USA: Association for Computing Machinery. ISBN 9781450386647.
  22. Can Transformer Models Measure Coherence In Text: Re-Thinking the Shuffle Test. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 1058–1064. Online: Association for Computational Linguistics.
  23. Discourse Coherence in the Wild: A Dataset, Evaluation and Methods. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 214–223. Melbourne, Australia: Association for Computational Linguistics.
  24. A Neural Graph-based Local Coherence Model. In Findings of the Association for Computational Linguistics: EMNLP 2021, 2316–2321. Punta Cana, Dominican Republic: Association for Computational Linguistics.
  25. A Neural Local Coherence Model for Text Quality Assessment. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 4328–4339. Brussels, Belgium: Association for Computational Linguistics.
  26. Rethinking Coherence Modeling: Synthetic vs. Downstream Tasks. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 3528–3539. Online: Association for Computational Linguistics.
  27. A Unified Neural Coherence Model. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2262–2272. Hong Kong, China: Association for Computational Linguistics.
  28. Language Models are Unsupervised Multitask Learners. OpenAI Blog, 1(8): 9.
  29. Continuous Martingales and Brownian Motion, volume 293. Springer Science & Business Media.
  30. A Neural Local Coherence Model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1320–1330. Vancouver, Canada: Association for Computational Linguistics.
  31. Van Dijk, T. A. 1985. Semantic discourse analysis. Handbook of discourse analysis, 2: 103–136.
  32. Attention Is All You Need. CoRR, abs/1706.03762.
  33. Language modeling via stochastic processes. arXiv:2203.11370.
  34. A short analysis of discourse coherence. Journal of Language Teaching and Research, 5(2): 460.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhecheng Sheng (8 papers)
  2. Tianhao Zhang (29 papers)
  3. Chen Jiang (94 papers)
  4. Dongyeop Kang (72 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.