Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems (2306.14608v1)

Published 26 Jun 2023 in eess.AS and cs.CL

Abstract: Rich sources of variability in natural speech present significant challenges to current data intensive speech recognition technologies. To model both speaker and environment level diversity, this paper proposes a novel Bayesian factorised speaker-environment adaptive training and test time adaptation approach for Conformer ASR models. Speaker and environment level characteristics are separately modeled using compact hidden output transforms, which are then linearly or hierarchically combined to represent any speaker-environment combination. Bayesian learning is further utilized to model the adaptation parameter uncertainty. Experiments on the 300-hr WHAM noise corrupted Switchboard data suggest that factorised adaptation consistently outperforms the baseline and speaker label only adapted Conformers by up to 3.1% absolute (10.4% relative) word error rate reductions. Further analysis shows the proposed method offers potential for rapid adaption to unseen speaker-environment conditions.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Jiajun Deng (75 papers)
  2. Guinan Li (23 papers)
  3. Xurong Xie (38 papers)
  4. Zengrui Jin (30 papers)
  5. Mingyu Cui (31 papers)
  6. Tianzi Wang (37 papers)
  7. Shujie Hu (36 papers)
  8. Mengzhe Geng (42 papers)
  9. Xunying Liu (92 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.