Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module (2409.00481v3)

Published 31 Aug 2024 in eess.AS and cs.SD

Abstract: Speech recognition is the technology that enables machines to interpret and process human speech, converting spoken language into text or commands. This technology is essential for applications such as virtual assistants, transcription services, and communication tools. The Audio-Visual Speech Recognition (AVSR) model enhances traditional speech recognition, particularly in noisy environments, by incorporating visual modalities like lip movements and facial expressions. While traditional AVSR models trained on large-scale datasets with numerous parameters can achieve remarkable accuracy, often surpassing human performance, they also come with high training costs and deployment challenges. To address these issues, we introduce an efficient AVSR model that reduces the number of parameters through the integration of a Dual Conformer Interaction Module (DCIM). In addition, we propose a pre-training method that further optimizes model performance by selectively updating parameters, leading to significant improvements in efficiency. Unlike conventional models that require the system to independently learn the hierarchical relationship between audio and visual modalities, our approach incorporates this distinction directly into the model architecture. This design enhances both efficiency and performance, resulting in a more practical and effective solution for AVSR tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xinyu Wang (186 papers)
  2. Qian Wang (453 papers)
  3. Haotian Jiang (43 papers)
  4. Haolin Huang (2 papers)
  5. Yu Fang (30 papers)
  6. Mengjie Xu (5 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com