Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Advancing Topic Segmentation of Broadcasted Speech with Multilingual Semantic Embeddings (2409.06222v1)

Published 10 Sep 2024 in cs.CL, cs.SD, and eess.AS

Abstract: Recent advancements in speech-based topic segmentation have highlighted the potential of pretrained speech encoders to capture semantic representations directly from speech. Traditionally, topic segmentation has relied on a pipeline approach in which transcripts of the automatic speech recognition systems are generated, followed by text-based segmentation algorithms. In this paper, we introduce an end-to-end scheme that bypasses this conventional two-step process by directly employing semantic speech encoders for segmentation. Focused on the broadcasted news domain, which poses unique challenges due to the diversity of speakers and topics within single recordings, we address the challenge of accessing topic change points efficiently in an end-to-end manner. Furthermore, we propose a new benchmark for spoken news topic segmentation by utilizing a dataset featuring approximately 1000 hours of publicly available recordings across six European languages and including an evaluation set in Hindi to test the model's cross-domain performance in a cross-lingual, zero-shot scenario. This setup reflects real-world diversity and the need for models adapting to various linguistic settings. Our results demonstrate that while the traditional pipeline approach achieves a state-of-the-art $P_k$ score of 0.2431 for English, our end-to-end model delivers a competitive $P_k$ score of 0.2564. When trained multilingually, these scores further improve to 0.1988 and 0.2370, respectively. To support further research, we release our model along with data preparation scripts, facilitating open research on multilingual spoken news topic segmentation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. SECTOR: A neural model for coherent topic segmentation and classification. Transactions of the Association for Computational Linguistics, 7:169–184, 2019.
  2. Attention-based neural text segmentation. In European Conference on Information Retrieval, pages 180–193. Springer, 2018.
  3. Statistical models for text segmentation. Machine learning, 34:177–210, 1999.
  4. Improving Automated Segmentation of Radio Shows with Audio Embeddings. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 751–755, 2020. 10.1109/ICASSP40776.2020.9054315.
  5. Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.", 2009.
  6. H. Bredin. pyannote.metrics: a toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems. In Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 2017. URL http://pyannote.github.io/pyannote-metrics.
  7. Modeling Latent Topics and Temporal Distance for Story Segmentation of Broadcast News. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 25(1):112–123, 2017.
  8. Topic-based hierarchical segmentation. IEEE Transactions on Audio, Speech, and Language Processing, 20(1):55–66, 2011.
  9. F. Y. Choi. Advances in domain independent linear text segmentation. arXiv preprint cs/0003083, 2000.
  10. The TDT-2 text and speech corpus. In Proceedings of the DARPA Broadcast News workshop, pages 57–60, 1999.
  11. Sentence-level multimodal and language-agnostic representations. arXiv preprint arXiv:2308.11466, 2023.
  12. C. Fournier. Evaluating text segmentation using boundary edit distance. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1702–1712, 2013.
  13. I. Ghinassi. Unsupervised Text Segmentation via Deep Sentence Encoders: A first step towards a common framework for text-based segmentation, summarization and indexing of media content. In Proceedings of International Workshop on Data-driven Personalisation of Television, 2021.
  14. Exploring Pre-Trained Neural Audio Representations for Audio Topic Segmentation. In 2023 IEEE International Conference on Multimedia and Expo (ICME), pages 1086–1091, 2023. 10.1109/ICME55011.2023.00190.
  15. Bidirectional LSTM networks for improved phoneme classification and recognition. In International conference on artificial neural networks, pages 799–804. Springer, 2005.
  16. M. Grootendorst. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794, 2022.
  17. Conformer: Convolution-augmented Transformer for Speech Recognition. Proc. Interspeech 2020, pages 5036–5040, 2020.
  18. I. Harrando and R. Troncy. "And cut!" Exploring Textual Representations for Media Content Segmentation and Alignment. In DataTV-2021, 2nd International Workshop on Data-driven Personalisation of Television, 2021.
  19. M. A. Hearst. Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages. Computational Linguistics, 23(1):33–64, 1997. URL https://aclanthology.org/J97-1003.
  20. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  21. The ICSI meeting corpus. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03)., volume 1, pages I–I. IEEE, 2003.
  22. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  23. Text Segmentation as a Supervised Learning Task. In M. Walker, H. Ji, and A. Stent, editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 469–473, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. 10.18653/v1/N18-2075. URL https://aclanthology.org/N18-2075.
  24. SegBot: A Generic Neural Text Segmentation Model with Pointer Network. In IJCAI, pages 4166–4172, 2018.
  25. The AMI meeting corpus. In Proceedings of Measuring Behavior 2005, 5th International Conference on Methods and Techniques in Behavioral Research, pages 137–140. Noldus Information Technology, 2005.
  26. L. Pevzner and M. A. Hearst. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 28(1):19–36, 2002.
  27. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pages 28492–28518. PMLR, 2023.
  28. M. Riedl and C. Biemann. TopicTiling: a text segmentation algorithm based on LDA. In Proceedings of ACL 2012 student research workshop, pages 37–42, 2012.
  29. Topic segmentation in ASR transcripts using bidirectional RNNS for change detection. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 512–518, 2017. 10.1109/ASRU.2017.8268979.
  30. Advancing Topic Segmentation of Broadcasted Speech with Multilingual Semantic Embeddings, Aug. 2024. URL https://doi.org/10.5281/zenodo.13338560.
  31. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets