Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model (2310.03975v1)

Published 6 Oct 2023 in cs.SD and cs.CL

Abstract: Recently, the usefulness of self-supervised representation learning (SSRL) methods has been confirmed in various downstream tasks. Many of these models, as exemplified by HuBERT and WavLM, use pseudo-labels generated from spectral features or the model's own representation features. From previous studies, it is known that the pseudo-labels contain semantic information. However, the masked prediction task, the learning criterion of HuBERT, focuses on local contextual information and may not make effective use of global semantic information such as speaker, theme of speech, and so on. In this paper, we propose a new approach to enrich the semantic representation of HuBERT. We apply topic model to pseudo-labels to generate a topic label for each utterance. An auxiliary topic classification task is added to HuBERT by using topic labels as teachers. This allows additional global semantic information to be incorporated in an unsupervised manner. Experimental results demonstrate that our method achieves comparable or better performance than the baseline in most tasks, including automatic speech recognition and five out of the eight SUPERB tasks. Moreover, we find that topic labels include various information about utterance, such as gender, speaker, and its theme. This highlights the effectiveness of our approach in capturing multifaceted semantic nuances.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. “SUPERB: Speech Processing Universal PERformance Benchmark” In Proc. Interspeech, 2021, pp. 1194–1198 DOI: 10.21437/Interspeech.2021-1775
  2. “SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities” In Proc. ACL, 2022, pp. 8479–8492
  3. “An exploration of self-supervised pretrained representations for end-to-end speech recognition” In Proc. ASRU, 2021
  4. “A study on the integration of pre-trained ssl, asr, lm and slu models for spoken language understanding” In Proc. SLT, 2023, pp. 406–413 IEEE
  5. “ LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech” In Proc. Interspeech, 2021, pp. 1439–1443 DOI: 10.21437/Interspeech.2021-556
  6. “ML-SUPERB: Multilingual Speech Universal PERformance Benchmark” In Proc. Interspeech, 2023, pp. 884–888 DOI: 10.21437/Interspeech.2023-1316
  7. “XTREME-S: Evaluating Cross-lingual Speech Representations” In Proc. Interspeech, 2022, pp. 3248–3252 DOI: 10.21437/Interspeech.2022-10007
  8. “IndicSUPERB: A speech processing universal performance benchmark for indian languages” In Proceedings of AAAI 37.11, 2023, pp. 12942–12950
  9. “Self-supervised speech representation learning: A review” In IJSTSP IEEE, 2022
  10. “HuBERT: How much can a bad teacher benefit ASR pre-training” In NeurIPS Workshop on Self-Supervised Learning for Speech and Audio Processing, 2020
  11. “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations” In NeurIPS 33, 2020
  12. “Wavlm: Large-scale self-supervised pre-training for full stack speech processing” In JSTSP IEEE, 2022
  13. “W2v-bert: Combining contrastive learning and masked language modeling for self-supervised speech pre-training” In Proc. ASRU, 2021, pp. 244–250 IEEE
  14. “Self-supervised learning with random-projection quantizer for speech recognition” In Proc. ICML, 2022, pp. 3915–3924 PMLR
  15. “A Transformer-Based E2E SLU Model for Improved Semantic Parsing” In Proc. ICASSP, 2023, pp. 1–2 IEEE
  16. “Fully Unsupervised Topic Clustering of Unlabelled Spoken Audio Using Self-Supervised Representation Learning and Topic Model” In Proc. ICASSP, 2023, pp. 1–5 IEEE
  17. “Speech Resynthesis from Discrete Disentangled Self-Supervised Representations” In Proc. Interspeech, 2021, pp. 3615–3619 DOI: 10.21437/Interspeech.2021-475
  18. “Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target” In Proc. Interspeech, 2023, pp. 1503–1507 DOI: 10.21437/Interspeech.2023-1718
  19. “Contentvec: An improved self-supervised speech representation by disentangling speakers” In Proc. ICML, 2022, pp. 18003–18017 PMLR
  20. Heng-Jui Chang, Alexander H Liu and James Glass “Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering” In Proc. Interspeech, 2023, pp. 2983–2987
  21. “Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR” In Proc. ICASSP, 2023, pp. 1–5 IEEE
  22. David M Blei, Andrew Y Ng and Michael I Jordan “Latent dirichlet allocation” In JMLR 3.Jan, 2003, pp. 993–1022
  23. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” In Proc. NAACL-HLT, 2019, pp. 4171–4186
  24. “An enhanced guided LDA model augmented with BERT based semantic strength for aspect term extraction in sentiment analysis” In Knowledge-based systems 246 Elsevier, 2022, pp. 108668
  25. Maarten Grootendorst “BERTopic: Neural topic modeling with a class-based TF-IDF procedure” In arXiv preprint arXiv:2203.05794, 2022
  26. “Librispeech: an ASR corpus based on public domain audio books” In Proc. ICASSP, 2015, pp. 5206–5210
  27. “Connectionist temporal classification” In Supervised sequence labelling with recurrent neural networks Springer, 2012, pp. 61–93
  28. “ESPnet: End-to-End Speech Processing Toolkit” In Proc. Interspeech, 2018
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Takashi Maekaku (9 papers)
  2. Jiatong Shi (82 papers)
  3. Xuankai Chang (61 papers)
  4. Yuya Fujita (16 papers)
  5. Shinji Watanabe (416 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.