AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition (2403.11578v1)
Abstract: In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence learning tool that utilizes dynamic programming for sequence mapping. While earlier efforts have tried to combine the CTC loss with an entropy maximization regularization term to mitigate this issue, they employed a constant weighting term on the regularization during the training, which we find may not be optimal. In this work, we introduce Adaptive Maximum Entropy Regularization (AdaMER), a technique that can modulate the impact of entropy regularization throughout the training process. This approach not only refines ASR model training but ensures that as training proceeds, predictions display the desired model confidence.
- “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd International Conference on Machine Learning, New York, NY, USA, 2006, ICML ’06, p. 369–376, Association for Computing Machinery.
- “Why does ctc result in peaky behavior?,” arXiv preprint arXiv:2105.14849, 2021.
- Théodore Bluche, Deep Neural Networks for Large Vocabulary Handwritten Text Recognition. (Réseaux de Neurones Profonds pour la Reconnaissance de Texte Manucrit à Large Vocabulaire), Ph.D. thesis, University of Paris-Sud, Orsay, France, 2015.
- “Framewise and ctc training of neural networks for handwriting recognition,” in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 2015, pp. 81–85.
- “Improved training for online end-to-end speech recognition systems,” 2018.
- “Connectionist temporal classification with maximum entropy regularization,” in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds. 2018, vol. 31, Curran Associates, Inc.
- “Librispeech: An asr corpus based on public domain audio books,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206–5210.
- “Deep speech 2: End-to-end speech recognition in english and mandarin,” in International conference on machine learning. PMLR, 2016, pp. 173–182.
- “Towards end-to-end speech recognition with deep convolutional neural networks,” arXiv preprint arXiv:1701.02720, 2017.
- “A novel connectionist system for unconstrained handwriting recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 855–868, 2009.
- “Ctc network with statistical language modeling for action sequence recognition in videos,” in Proceedings of the on Thematic Workshops of ACM Multimedia 2017, New York, NY, USA, 2017, Thematic Workshops ’17, p. 393–401, Association for Computing Machinery.
- “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
- “Regularizing neural networks by penalizing confident output distributions,” 2017.
- “Generalized entropy regularization or: There’s nothing special about label smoothing,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, July 2020, pp. 6870–6886, Association for Computational Linguistics.
- E. T. Jaynes, “Information theory and statistical mechanics,” Phys. Rev., vol. 106, pp. 620–630, May 1957.
- “Asynchronous methods for deep reinforcement learning,” in International conference on machine learning. PMLR, 2016, pp. 1928–1937.
- “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” CoRR, vol. abs/1801.01290, 2018.
- “Soft actor-critic algorithms and applications,” 2019.
- “Conformer: Convolution-augmented transformer for speech recognition,” arXiv preprint arXiv:2005.08100, 2020.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.