Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders (2211.10999v2)

Published 20 Nov 2022 in cs.SD, cs.CV, cs.LG, and eess.AS

Abstract: Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging not only the audio itself but also the target speaker's lip movements. This approach has been shown to yield improvements over audio-only speech enhancement, particularly for the removal of interfering speech. Despite recent advances in speech synthesis, most audio-visual approaches continue to use spectral mapping/masking to reproduce the clean audio, often resulting in visual backbones added to existing speech enhancement architectures. In this work, we propose LA-VocE, a new two-stage approach that predicts mel-spectrograms from noisy audio-visual speech via a transformer-based architecture, and then converts them into waveform audio using a neural vocoder (HiFi-GAN). We train and evaluate our framework on thousands of speakers and 11+ different languages, and study our model's ability to adapt to different levels of background noise and speech interference. Our experiments show that LA-VocE outperforms existing methods according to multiple metrics, particularly under very noisy scenarios.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Rodrigo Mira (13 papers)
  2. Buye Xu (27 papers)
  3. Jacob Donley (19 papers)
  4. Anurag Kumar (118 papers)
  5. Stavros Petridis (64 papers)
  6. Vamsi Krishna Ithapu (24 papers)
  7. Maja Pantic (100 papers)
Citations (11)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets