Papers
Topics
Authors
Recent
Search
2000 character limit reached

Unsupervised Extractive Dialogue Summarization in Hyperdimensional Space

Published 16 May 2024 in cs.CL and cs.AI | (2405.09765v1)

Abstract: We present HyperSum, an extractive summarization framework that captures both the efficiency of traditional lexical summarization and the accuracy of contemporary neural approaches. HyperSum exploits the pseudo-orthogonality that emerges when randomly initializing vectors at extremely high dimensions ("blessing of dimensionality") to construct representative and efficient sentence embeddings. Simply clustering the obtained embeddings and extracting their medoids yields competitive summaries. HyperSum often outperforms state-of-the-art summarizers -- in terms of both summary accuracy and faithfulness -- while being 10 to 100 times faster. We open-source HyperSum as a strong baseline for unsupervised extractive summarization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. “Copy or rewrite: Hybrid summarization with hierarchical reinforcement learning,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 9306–9313, Apr. 2020.
  2. “A survey on dialogue summarization: Recent advances and new frontiers,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22. 7 2022, pp. 5453–5460, International Joint Conferences on Artificial Intelligence Organization.
  3. “GenCompareSum: a hybrid unsupervised summarization method using salience,” in Proceedings of the 21st Workshop on Biomedical Language Processing, Dublin, Ireland, May 2022, pp. 220–240, Association for Computational Linguistics.
  4. Pentti Kanerva, “Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors,” Cognitive computation, vol. 1, pp. 139–159, 2009.
  5. “Blessing of dimensionality: mathematical foundations of the statistical physics of data,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 376, no. 2118, pp. 20170237, 2018.
  6. Laurens Van der Maaten and Geoffrey Hinton, “Visualizing data using t-sne.,” Journal of machine learning research, vol. 9, no. 11, 2008.
  7. “Hyperdimensional computing for text classification,” in Design, automation test in Europe conference exhibition (DATE), University Booth, 2016, pp. 1–1.
  8. “A survey on hyperdimensional computing aka vector symbolic architectures, part ii: Applications, cognitive models, and challenges,” ACM Computing Surveys, vol. 55, no. 9, pp. 1–52, 2023.
  9. Michel Ledoux, The concentration of measure phenomenon, Number 89. American Mathematical Soc., 2001.
  10. Pentti Kanerva, “What we mean when we say” what’s the dollar of mexico?”: Prototypes and mapping in concept space,” in 2010 AAAI fall symposium series, 2010.
  11. “Textrank: Bringing order into text,” in Proceedings of the 2004 conference on empirical methods in natural language processing, 2004, pp. 404–411.
  12. “Sentence centrality revisited for unsupervised summarization,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, July 2019, pp. 6236–6247, Association for Computational Linguistics.
  13. “OTExtSum: Extractive Text Summarisation with Optimal Transport,” in Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, United States, July 2022, pp. 1128–1141, Association for Computational Linguistics.
  14. “Make lead bias in your favor: Zero-shot abstractive news summarization,” in International Conference on Learning Representations, 2020.
  15. “Sparse binary distributed encoding of scalars,” Journal of Automation and Information Sciences, vol. 37, pp. 12–23, 01 2005.
  16. “The ami meeting corpus,” 2005.
  17. “The icsi meeting corpus,” in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03). IEEE, 2003, vol. 1, pp. I–I.
  18. “StreamHover: Livestream transcript summarization and annotation,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, Nov. 2021, pp. 6457–6474, Association for Computational Linguistics.
  19. “ELITR minuting corpus: A novel dataset for automatic minuting from multi-party meetings in English and Czech,” in Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille, France, June 2022, pp. 3174–3182, European Language Resources Association.
  20. Chin-Yew Lin, “ROUGE: A package for automatic evaluation of summaries,” in Text Summarization Branches Out, Barcelona, Spain, July 2004, pp. 74–81, Association for Computational Linguistics.
  21. “Extractive is not faithful: An investigation of broad unfaithfulness problems in extractive summarization,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada, July 2023, pp. 2153–2174, Association for Computational Linguistics.
  22. “Fast k-medoids clustering in rust and python,” Journal of Open Source Software, vol. 7, no. 75, pp. 4183, 2022.
  23. “Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing,” EMNLP 2018, p. 66, 2018.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.