The USTC-NERCSLIP Systems for The ICMC-ASR Challenge (2407.02052v1)
Abstract: This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervised learning representation based multi-speaker embedding and beamforming using the speaker position, respectively. For ASR, we employ an iterative pseudo-label generation method based on fusion model to obtain text labels of unsupervised data. To mitigate the impact of accent, an Accent-ASR framework is proposed, which captures pronunciation-related accent features at a fine-grained level and linguistic information at a coarse-grained level. On the ICMC-ASR eval set, the proposed system achieves a CER of 13.16% on track 1 and a cpCER of 21.48% on track 2, which significantly outperforms the official baseline system and obtains the first rank on both tracks.
- “The ustc-nercslip systems for chime-7 challenge,” in Proc. CHiME 2023, 2023, pp. 13–18.
- “Icmc-asr: The icassp 2024 in-car multi-channel automatic speech recognition challenge,” arXiv preprint arXiv:2401.03473, 2024.
- “Learning robust and multilingual speech representations,” arXiv preprint arXiv:2001.11128, 2020.
- “Decoupling and interacting multi-task learning network for joint speech and accent recognition,” IEEE/ACM TASLP, vol. 32, 2023.
- “The ustc-nelslip offline speech translation systems for iwslt 2022,” in Proc. IWSLT, 2022, pp. 198–207.