Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Audio-visual Recognition of Overlapped speech for the LRS2 dataset (2001.01656v1)

Published 6 Jan 2020 in eess.AS and cs.SD

Abstract: Automatic recognition of overlapped speech remains a highly challenging task to date. Motivated by the bimodal nature of human speech perception, this paper investigates the use of audio-visual technologies for overlapped speech recognition. Three issues associated with the construction of audio-visual speech recognition (AVSR) systems are addressed. First, the basic architecture designs i.e. end-to-end and hybrid of AVSR systems are investigated. Second, purposefully designed modality fusion gates are used to robustly integrate the audio and visual features. Third, in contrast to a traditional pipelined architecture containing explicit speech separation and recognition components, a streamlined and integrated AVSR system optimized consistently using the lattice-free MMI (LF-MMI) discriminative criterion is also proposed. The proposed LF-MMI time-delay neural network (TDNN) system establishes the state-of-the-art for the LRS2 dataset. Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29.98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system. Consistent performance improvements of 4.89\% absolute in WER reduction over the baseline AVSR system using feature fusion are also obtained.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Jianwei Yu (64 papers)
  2. Shi-Xiong Zhang (48 papers)
  3. Jian Wu (314 papers)
  4. Shahram Ghorbani (7 papers)
  5. Bo Wu (144 papers)
  6. Shiyin Kang (27 papers)
  7. Shansong Liu (19 papers)
  8. Xunying Liu (92 papers)
  9. Helen Meng (204 papers)
  10. Dong Yu (328 papers)
Citations (71)