Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation (2007.12903v1)

Published 25 Jul 2020 in cs.SD and eess.AS

Abstract: For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and noisy speech pairs which are generally obtained via data simulation. Recently, several joint optimization techniques have been proposed to train the front-end without parallel data within an end-to-end automatic speech recognition (ASR) scheme. However, the ASR objective is sub-optimal and insufficient for fully training the front-end, which still leaves room for improvement. In this paper, we propose a novel approach which incorporates flow-based density estimation for the robust front-end using non-parallel clean and noisy speech. Experimental results on the CHiME-4 dataset show that the proposed method outperforms the conventional techniques where the front-end is trained only with ASR objective.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Hyeongju Kim (14 papers)
  2. Hyeonseung Lee (11 papers)
  3. Woo Hyun Kang (13 papers)
  4. Hyung Yong Kim (4 papers)
  5. Nam Soo Kim (47 papers)
Citations (22)

Summary

We haven't generated a summary for this paper yet.