Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition (2006.10414v1)

Published 18 Jun 2020 in eess.AS and cs.SD

Abstract: Code-switching (CS) occurs when a speaker alternates words of two or more languages within a single sentence or across sentences. Automatic speech recognition (ASR) of CS speech has to deal with two or more languages at the same time. In this study, we propose a Transformer-based architecture with two symmetric language-specific encoders to capture the individual language attributes, that improve the acoustic representation of each language. These representations are combined using a language-specific multi-head attention mechanism in the decoder module. Each encoder and its corresponding attention module in the decoder are pre-trained using a large monolingual corpus aiming to alleviate the impact of limited CS training data. We call such a network a multi-encoder-decoder (MED) architecture. Experiments on the SEAME corpus show that the proposed MED architecture achieves 10.2% and 10.8% relative error rate reduction on the CS evaluation sets with Mandarin and English as the matrix language respectively.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xinyuan Zhou (7 papers)
  2. Yanhua Long (21 papers)
  3. Yijie Li (23 papers)
  4. Haizhou Li (285 papers)
  5. Emre Yılmaz (18 papers)
Citations (49)