Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multichannel End-to-end Speech Recognition (1703.04783v1)

Published 14 Mar 2017 in cs.SD and cs.CL

Abstract: The field of speech recognition is in the midst of a paradigm shift: end-to-end neural networks are challenging the dominance of hidden Markov models as a core technology. Using an attention mechanism in a recurrent encoder-decoder architecture solves the dynamic time alignment problem, allowing joint end-to-end training of the acoustic and LLMing components. In this paper we extend the end-to-end framework to encompass microphone array signal processing for noise suppression and speech enhancement within the acoustic encoding network. This allows the beamforming components to be optimized jointly within the recognition architecture to improve the end-to-end speech recognition objective. Experiments on the noisy speech benchmarks (CHiME-4 and AMI) show that our multichannel end-to-end system outperformed the attention-based baseline with input from a conventional adaptive beamformer.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Tsubasa Ochiai (43 papers)
  2. Shinji Watanabe (416 papers)
  3. Takaaki Hori (41 papers)
  4. John R. Hershey (40 papers)
Citations (92)

Summary

We haven't generated a summary for this paper yet.