Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

End-to-End Multi-Channel Transformer for Speech Recognition (2102.03951v1)

Published 8 Feb 2021 in eess.AS, cs.CL, and cs.SD

Abstract: Transformers are powerful neural architectures that allow integrating different modalities using attention mechanisms. In this paper, we leverage the neural transformer architectures for multi-channel speech recognition systems, where the spectral and spatial information collected from different microphones are integrated using attention layers. Our multi-channel transformer network mainly consists of three parts: channel-wise self attention layers (CSA), cross-channel attention layers (CCA), and multi-channel encoder-decoder attention layers (EDA). The CSA and CCA layers encode the contextual relationship within and between channels and across time, respectively. The channel-attended outputs from CSA and CCA are then fed into the EDA layers to help decode the next token given the preceding ones. The experiments show that in a far-field in-house dataset, our method outperforms the baseline single-channel transformer, as well as the super-directive and neural beamformers cascaded with the transformers.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Feng-Ju Chang (15 papers)
  2. Martin Radfar (17 papers)
  3. Athanasios Mouchtaris (31 papers)
  4. Brian King (16 papers)
  5. Siegfried Kunzmann (13 papers)
Citations (32)

Summary

We haven't generated a summary for this paper yet.