On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition (2203.14593v3)
Abstract: Accurate recognition of dysarthric and elderly speech remain challenging tasks to date. Speaker-level heterogeneity attributed to accent or gender, when aggregated with age and speech impairment, create large diversity among these speakers. Scarcity of speaker-level data limits the practical use of data-intensive model based speaker adaptation methods. To this end, this paper proposes two novel forms of data-efficient, feature-based on-the-fly speaker adaptation methods: variance-regularized spectral basis embedding (SVR) and spectral feature driven f-LHUC transforms. Experiments conducted on UASpeech dysarthric and DementiaBank Pitt elderly speech corpora suggest the proposed on-the-fly speaker adaptation approaches consistently outperform baseline iVector adapted hybrid DNN/TDNN and E2E Conformer systems by statistically significant WER reduction of 2.48%-2.85% absolute (7.92%-8.06% relative), and offline model based LHUC adaptation by 1.82% absolute (5.63% relative) respectively.
- Mengzhe Geng (42 papers)
- Xurong Xie (38 papers)
- Rongfeng Su (5 papers)
- Jianwei Yu (64 papers)
- Zengrui Jin (30 papers)
- Tianzi Wang (37 papers)
- Shujie Hu (36 papers)
- Zi Ye (20 papers)
- Helen Meng (204 papers)
- Xunying Liu (92 papers)