Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling (2012.07353v2)

Published 14 Dec 2020 in eess.AS, cs.AI, and cs.SD

Abstract: Accents mismatching is a critical problem for end-to-end ASR. This paper aims to address this problem by building an accent-robust RNN-T system with domain adversarial training (DAT). We unveil the magic behind DAT and provide, for the first time, a theoretical guarantee that DAT learns accent-invariant representations. We also prove that performing the gradient reversal in DAT is equivalent to minimizing the Jensen-Shannon divergence between domain output distributions. Motivated by the proof of equivalence, we introduce reDAT, a novel technique based on DAT, which relabels data using either unsupervised clustering or soft labels. Experiments on 23K hours of multi-accent data show that DAT achieves competitive results over accent-specific baselines on both native and non-native English accents but up to 13% relative WER reduction on unseen accents; our reDAT yields further improvements over DAT by 3% and 8% relatively on non-native accents of American and British English.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Hu Hu (18 papers)
  2. Xuesong Yang (18 papers)
  3. Zeynab Raeesy (6 papers)
  4. Jinxi Guo (15 papers)
  5. Gokce Keskin (10 papers)
  6. Harish Arsikere (7 papers)
  7. Ariya Rastrow (55 papers)
  8. Andreas Stolcke (57 papers)
  9. Roland Maas (24 papers)
Citations (30)

Summary

We haven't generated a summary for this paper yet.