Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data (1904.03802v2)

Published 8 Apr 2019 in cs.CL

Abstract: The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching ASR using only monolingual data. Our method encourages the distributions of output token embeddings of monolingual languages to be similar, and hence, promotes the ASR model to easily code-switch between languages. Specifically, we propose to use Jensen-Shannon divergence and cosine distance based constraints. The former will enforce output embeddings of monolingual languages to possess similar distributions, while the later simply brings the centroids of two distributions to be close to each other. Experimental results demonstrate high effectiveness of the proposed method, yielding up to 4.5% absolute mixed error rate improvement on Mandarin-English code-switching ASR task.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yerbolat Khassanov (19 papers)
  2. Haihua Xu (23 papers)
  3. Van Tung Pham (13 papers)
  4. Zhiping Zeng (6 papers)
  5. Eng Siong Chng (112 papers)
  6. Chongjia Ni (18 papers)
  7. Bin Ma (78 papers)
Citations (19)

Summary

We haven't generated a summary for this paper yet.