Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization (2401.06980v1)
Abstract: In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}. {BL-JUST employs a lower and upper level optimization with an unsupervised loss and a supervised loss respectively, leveraging recent advances in penalty-based bilevel optimization to solve this challenging ASR problem with affordable complexity and rigorous convergence guarantees.} To evaluate BL-JUST, extensive experiments on the LibriSpeech and TED-LIUM v2 datasets have been conducted. BL-JUST achieves superior performance over the commonly used pre-training followed by fine-tuning strategy.
- vq-wav2vec: Self-supervised learning of discrete speech representations. arXiv preprint arXiv:1910.05453, 2019.
- wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449–12460, 2020.
- Joint unsupervised and supervised training for multilingual asr. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6402–6406, 2022.
- Optimization methods for large-scale machine learning. SIAM review, 60(2):223–311, 2018.
- Special issue on bilevel optimization. EURO Journal on Computational Optimization, 8:1–2, 2020.
- Learning with limited samples: Meta-learning and applications to communication systems. Foundations and Trends® in Signal Processing, 17(2):79–208, 2023.
- A single-timescale method for stochastic bilevel optimization. In International Conference on Artificial Intelligence and Statistics, pages 2466–2488, 2022.
- Self-supervised learning with random-projection quantizer for speech recognition. In International Conference on Machine Learning, pages 3915–3924, 2022.
- Bilevel methods for image reconstruction. Foundations and Trends® in Signal Processing, 15(2-3):121–289, 2022.
- Bilevel optimization. In Springer optimization and its applications, volume 161. 2020.
- New types of deep neural network learning for speech recognition and related applications: An overview. In IEEE international conference on acoustics, speech and signal processing, pages 8599–8603, 2013.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, pages 1126–1135, 2017.
- Bilevel programming for hyperparameter optimization and meta-learning. In International Conference on Machine Learning, pages 1568–1577, 2018.
- Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of International Conference on Machine Learning, pages 369–376, 2006.
- Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100, 2020.
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal processing magazine, 29(6):82–97, 2012.
- Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3451–3460, 2021.
- Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis, 59:85–116, 2022.
- Investigating bi-level optimization for learning and vision from a unified perspective: A survey and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):10045–10067, 2021.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Songtao Lu. Bilevel optimization with coupled decision-dependent distributions. In International Conference on Machine Learning, pages 22758–22789, 2023.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009.
- Librispeech: an asr corpus based on public domain audio books. In IEEE international conference on acoustics, speech and signal processing, pages 5206–5210, 2015.
- Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779, 2019.
- Fabian Pedregosa. Hyperparameter optimization with approximate gradient. In International Conference on Machine Learning, pages 737–746, 2016.
- To transfer or not to transfer. In NIPS workshop on transfer learning, number 3, 2005.
- Enhancing the ted-lium corpus with selected data for language modeling and more ted talks. In The International Conference on Language Resources and Evaluation, pages 3935–3939, 2014.
- A first order method for solving convex bilevel optimization problems. SIAM Journal on Optimization, 27(2):640–660, 2017.
- On penalty-based bilevel gradient descent method. arXiv preprint arXiv:2302.05185, 2023.
- Bilevel and multilevel programming: A bibliography review. Journal of Global optimization, 5(3):291–306, 1994.
- Characterizing and avoiding negative transfer. In Proceedings of Conference on Computer Vision and Pattern Recognition, pages 11293–11302, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.