Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One (2302.09908v2)

Published 20 Feb 2023 in cs.SD, cs.AI, cs.CL, cs.LG, and eess.AS

Abstract: Although automatic speech recognition (ASR) can perform well in common non-overlapping environments, sustaining performance in multi-talker overlapping speech recognition remains challenging. Recent research revealed that ASR model's encoder captures different levels of information with different layers -- the lower layers tend to have more acoustic information, and the upper layers more linguistic. This inspires us to develop a Sidecar separator to empower a well-trained ASR model for multi-talker scenarios by separating the mixed speech embedding between two suitable layers. We experimented with a wav2vec 2.0-based ASR model with a Sidecar mounted. By freezing the parameters of the original model and training only the Sidecar (8.7 M, 8.4% of all parameters), the proposed approach outperforms the previous state-of-the-art by a large margin for the 2-speaker mixed LibriMix dataset, reaching a word error rate (WER) of 10.36%; and obtains comparable results (7.56%) for LibriSpeechMix dataset when limited training.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Lingwei Meng (31 papers)
  2. Jiawen Kang (204 papers)
  3. Mingyu Cui (31 papers)
  4. Yuejiao Wang (10 papers)
  5. Xixin Wu (85 papers)
  6. Helen Meng (204 papers)
Citations (16)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com