Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction (2309.03619v2)

Published 7 Sep 2023 in cs.SD, cs.LG, and eess.AS

Abstract: Self-supervised learning (SSL) has emerged as a promising paradigm for learning flexible speech representations from unlabeled data. By designing pretext tasks that exploit statistical regularities, SSL models can capture useful representations that are transferable to downstream tasks. This study provides an empirical analysis of Barlow Twins (BT), an SSL technique inspired by theories of redundancy reduction in human perception. On downstream tasks, BT representations accelerated learning and transferred across domains. However, limitations exist in disentangling key explanatory factors, with redundancy reduction and invariance alone insufficient for factorization of learned latents into modular, compact, and informative codes. Our ablations study isolated gains from invariance constraints, but the gains were context-dependent. Overall, this work substantiates the potential of Barlow Twins for sample-efficient speech encoding. However, challenges remain in achieving fully hierarchical representations. The analysis methodology and insights pave a path for extensions incorporating further inductive priors and perceptual principles to further enhance the BT self-supervision framework.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. An overview of speaker identification: Accuracy and robustness issues. IEEE circuits and systems magazine 2011, 11, 23–61.
  2. A review on deep learning approaches in speaker identification. In Proceedings of the Proceedings of the 8th international conference on signal processing systems, 2016, pp. 142–147.
  3. Speaker identification and clustering using convolutional neural networks. In Proceedings of the 2016 IEEE 26th international workshop on machine learning for signal processing (MLSP). IEEE, 2016, pp. 1–6.
  4. Deep Language: a comprehensive deep learning approach to end-to-end language recognition. In Proceedings of the Odyssey, 2016, Vol. 2016, pp. 109–116.
  5. A comprehensive study on deep learning-based methods for sign language recognition. IEEE Transactions on Multimedia 2021, 24, 1750–1762.
  6. A review on speech processing using machine learning paradigm. International Journal of Speech Technology 2021, 24, 367–388.
  7. Self-supervised speech representation learning: A review. IEEE Journal of Selected Topics in Signal Processing 2022.
  8. Unsupervised training of a speech recognizer: Recent experiments. In Proceedings of the in Proc. EUROSPEECH, 1999.
  9. Lightly supervised and unsupervised acoustic model training. Computer Speech & Language 2002, 16, 115–129.
  10. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 2018.
  11. An unsupervised autoregressive model for speech representation learning. arXiv preprint arXiv:1904.03240 2019.
  12. Barlow, H. Redundancy reduction revisited. Network: computation in neural systems 2001, 12, 241.
  13. Barlow twins: Self-supervised learning via redundancy reduction. In Proceedings of the International Conference on Machine Learning. PMLR, 2021, pp. 12310–12320.
  14. Self-supervised learning: Generative or contrastive. IEEE Transactions on Knowledge and Data Engineering 2021, 35, 857–876.
  15. Audio self-supervised learning: A survey. Patterns 2022, 3, 100616.
  16. Supervised contrastive learning. Advances in neural information processing systems 2020, 33, 18661–18673.
  17. A simple framework for contrastive learning of visual representations. In Proceedings of the International conference on machine learning. PMLR, 2020, pp. 1597–1607.
  18. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 2020, 33, 21271–21284.
  19. Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems 2020, 33, 9912–9924.
  20. Siamese neural networks for one-shot image recognition. In Proceedings of the ICML deep learning workshop. Lille, 2015, Vol. 2.
  21. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems 2020, 33, 12449–12460.
  22. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing 2021, 29, 3451–3460.
  23. Voxceleb: Large-scale speaker verification in the wild. Computer Speech & Language 2020, 60, 101027.
  24. Librispeech: an asr corpus based on public domain audio books. In Proceedings of the 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2015, pp. 5206–5210.
  25. Warden, P. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 2018.
  26. Emotional voice conversion: Theory, databases and ESD. Speech Communication 2022, 137, 1–18.
  27. (online speech bank), A.R. World Leaders Address the U.S. Congress, 2011.
  28. Isolating sources of disentanglement in variational autoencoders. Advances in neural information processing systems 2018, 31.
  29. Theory and evaluation metrics for learning disentangled representations. arXiv preprint arXiv:1908.09961 2019.
  30. How to not measure disentanglement. arXiv preprint arXiv:1910.05587 2019.
  31. Variational inference of disentangled latent concepts from unlabeled observations. arXiv preprint arXiv:1711.00848 2017.
  32. Learning deep disentangled embeddings with the f-statistic loss. Advances in neural information processing systems 2018, 31.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yusuf Brima (6 papers)
  2. Ulf Krumnack (11 papers)
  3. Simone Pika (3 papers)
  4. Gunther Heidemann (8 papers)

Summary

We haven't generated a summary for this paper yet.