Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch (2210.09951v1)

Published 18 Oct 2022 in cs.SD and eess.AS

Abstract: In this work, we compare from-scratch sequence-level cross-entropy (full-sum) training of Hidden Markov Model (HMM) and Connectionist Temporal Classification (CTC) topologies for automatic speech recognition (ASR). Besides accuracy, we further analyze their capability for generating high-quality time alignment between the speech signal and the transcription, which can be crucial for many subsequent applications. Moreover, we propose several methods to improve convergence of from-scratch full-sum training by addressing the alignment modeling issue. Systematic comparison is conducted on both Switchboard and LibriSpeech corpora across CTC, posterior HMM with and w/o transition probabilities, and standard hybrid HMM. We also provide a detailed analysis of both Viterbi forced-alignment and Baum-Welch full-sum occupation probabilities.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tina Raissi (12 papers)
  2. Wei Zhou (312 papers)
  3. Simon Berger (8 papers)
  4. Ralf Schlüter (73 papers)
  5. Hermann Ney (104 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.