Papers
Topics
Authors
Recent
2000 character limit reached

AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition (2403.11578v1)

Published 18 Mar 2024 in eess.AS

Abstract: In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence learning tool that utilizes dynamic programming for sequence mapping. While earlier efforts have tried to combine the CTC loss with an entropy maximization regularization term to mitigate this issue, they employed a constant weighting term on the regularization during the training, which we find may not be optimal. In this work, we introduce Adaptive Maximum Entropy Regularization (AdaMER), a technique that can modulate the impact of entropy regularization throughout the training process. This approach not only refines ASR model training but ensures that as training proceeds, predictions display the desired model confidence.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd International Conference on Machine Learning, New York, NY, USA, 2006, ICML ’06, p. 369–376, Association for Computing Machinery.
  2. “Why does ctc result in peaky behavior?,” arXiv preprint arXiv:2105.14849, 2021.
  3. Théodore Bluche, Deep Neural Networks for Large Vocabulary Handwritten Text Recognition. (Réseaux de Neurones Profonds pour la Reconnaissance de Texte Manucrit à Large Vocabulaire), Ph.D. thesis, University of Paris-Sud, Orsay, France, 2015.
  4. “Framewise and ctc training of neural networks for handwriting recognition,” in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 2015, pp. 81–85.
  5. “Improved training for online end-to-end speech recognition systems,” 2018.
  6. “Connectionist temporal classification with maximum entropy regularization,” in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds. 2018, vol. 31, Curran Associates, Inc.
  7. “Librispeech: An asr corpus based on public domain audio books,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206–5210.
  8. “Deep speech 2: End-to-end speech recognition in english and mandarin,” in International conference on machine learning. PMLR, 2016, pp. 173–182.
  9. “Towards end-to-end speech recognition with deep convolutional neural networks,” arXiv preprint arXiv:1701.02720, 2017.
  10. “A novel connectionist system for unconstrained handwriting recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 855–868, 2009.
  11. “Ctc network with statistical language modeling for action sequence recognition in videos,” in Proceedings of the on Thematic Workshops of ACM Multimedia 2017, New York, NY, USA, 2017, Thematic Workshops ’17, p. 393–401, Association for Computing Machinery.
  12. “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
  13. “Regularizing neural networks by penalizing confident output distributions,” 2017.
  14. “Generalized entropy regularization or: There’s nothing special about label smoothing,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, July 2020, pp. 6870–6886, Association for Computational Linguistics.
  15. E. T. Jaynes, “Information theory and statistical mechanics,” Phys. Rev., vol. 106, pp. 620–630, May 1957.
  16. “Asynchronous methods for deep reinforcement learning,” in International conference on machine learning. PMLR, 2016, pp. 1928–1937.
  17. “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” CoRR, vol. abs/1801.01290, 2018.
  18. “Soft actor-critic algorithms and applications,” 2019.
  19. “Conformer: Convolution-augmented transformer for speech recognition,” arXiv preprint arXiv:2005.08100, 2020.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 1 tweet with 2 likes about this paper.