Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition (2303.00802v1)

Published 1 Mar 2023 in cs.CL, cs.SD, and eess.AS

Abstract: The awareness for biased ASR datasets or models has increased notably in recent years. Even for English, despite a vast amount of available training data, systems perform worse for non-native speakers. In this work, we improve an accent-conversion model (ACM) which transforms native US-English speech into accented pronunciation. We include phonetic knowledge in the ACM training to provide accurate feedback about how well certain pronunciation patterns were recovered in the synthesized waveform. Furthermore, we investigate the feasibility of learned accent representations instead of static embeddings. Generated data was then used to train two state-of-the-art ASR systems. We evaluated our approach on native and non-native English datasets and found that synthetically accented data helped the ASR to better understand speech from seen accents. This observation did not translate to unseen accents, and it was not observed for a model that had been pre-trained exclusively with native speech.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Philipp Klumpp (5 papers)
  2. Pooja Chitkara (8 papers)
  3. Prashant Serai (6 papers)
  4. Jilong Wu (8 papers)
  5. Irina-Elena Veliche (6 papers)
  6. Rongqing Huang (7 papers)
  7. Qing He (88 papers)
  8. Leda Sarı (6 papers)
Citations (1)