Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition (2006.10414v1)

Published 18 Jun 2020 in eess.AS and cs.SD

Abstract: Code-switching (CS) occurs when a speaker alternates words of two or more languages within a single sentence or across sentences. Automatic speech recognition (ASR) of CS speech has to deal with two or more languages at the same time. In this study, we propose a Transformer-based architecture with two symmetric language-specific encoders to capture the individual language attributes, that improve the acoustic representation of each language. These representations are combined using a language-specific multi-head attention mechanism in the decoder module. Each encoder and its corresponding attention module in the decoder are pre-trained using a large monolingual corpus aiming to alleviate the impact of limited CS training data. We call such a network a multi-encoder-decoder (MED) architecture. Experiments on the SEAME corpus show that the proposed MED architecture achieves 10.2% and 10.8% relative error rate reduction on the CS evaluation sets with Mandarin and English as the matrix language respectively.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (5)

Xinyuan Zhou (7 papers)
Yanhua Long (21 papers)
Yijie Li (23 papers)
Haizhou Li (285 papers)
Emre Yılmaz (18 papers)

Citations (49)

View on Semantic Scholar

Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition (2006.10414v1)

Related Papers