Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation (2307.12231v1)

Published 23 Jul 2023 in cs.SD, cs.CL, and eess.AS

Abstract: Neural speech separation has made remarkable progress and its integration with automatic speech recognition (ASR) is an important direction towards realizing multi-speaker ASR. This work provides an insightful investigation of speech separation in reverberant and noisy-reverberant scenarios as an ASR front-end. In detail, we explore multi-channel separation methods, mask-based beamforming and complex spectral mapping, as well as the best features to use in the ASR back-end model. We employ the recent self-supervised learning representation (SSLR) as a feature and improve the recognition performance from the case with filterbank features. To further improve multi-speaker recognition performance, we present a carefully designed training strategy for integrating speech separation and recognition with SSLR. The proposed integration using TF-GridNet-based complex spectral mapping and WavLM-based SSLR achieves a 2.5% word error rate in reverberant WHAMR! test set, significantly outperforming an existing mask-based MVDR beamforming and filterbank integration (28.9%).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yoshiki Masuyama (30 papers)
  2. Xuankai Chang (61 papers)
  3. Wangyou Zhang (35 papers)
  4. Samuele Cornell (41 papers)
  5. Zhong-Qiu Wang (41 papers)
  6. Nobutaka Ono (22 papers)
  7. Yanmin Qian (97 papers)
  8. Shinji Watanabe (416 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.