Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge (2401.03473v3)

Published 7 Jan 2024 in cs.SD, cs.AI, and eess.AS

Abstract: To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours of noise for data augmentation. Two tracks, including automatic speech recognition (ASR) and automatic speech diarization and recognition (ASDR) are set up, using character error rate (CER) and concatenated minimum permutation character error rate (cpCER) as evaluation metrics, respectively. Overall, the ICMC-ASR Challenge attracts 98 participating teams and receives 53 valid results in both tracks. In the end, first-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track, showing an absolute improvement of 13.08% and 51.4% compared to our challenge baseline, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (13)
  1. “The iscslp 2022 intelligent cockpit speech recognition challenge (icsrc): Dataset, tracks, baseline and results,” in Proc. ISCSLP. IEEE, 2022, pp. 507–511.
  2. “The ustc-nercslip systems for the chime-7 dasr challenge,” 2023.
  3. “Mp-senet: A speech enhancement model with parallel denoising of magnitude and phase spectra,” 2023.
  4. “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” IEEE/ACM TASLP, vol. 29, pp. 3451–3460, 2021.
  5. “Deep-fsmn for large vocabulary continuous speech recognition,” in Proc. ICASSP, 2018, pp. 5869–5873.
  6. “Efficient self-supervised learning with contextualized target representations for vision, speech and language,” in Proc. ICML. PMLR, 2023, pp. 1416–1429.
  7. “Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification,” in Proc. Interspeech. 2020, pp. 3830–3834, ISCA.
  8. “A deep representation learning-based speech enhancement method using complex convolution recurrent variational autoencoder,” arXiv preprint arXiv:2312.09620, 2023.
  9. “Royalflush speaker diarization system for icassp 2022 multi-channel multi-party meeting transcription challenge,” arXiv preprint arXiv:2202.04814, 2022.
  10. “Cam++: A fast and efficient network for speaker verification using context-aware masking,” arXiv preprint arXiv:2303.00332, 2023.
  11. “Wenet: Production oriented streaming and non-streaming end-to-end speech recognition toolkit,” 2021.
  12. “The npu-aslp system for audio-visual speech recognition in misp 2022 challenge,” in Proc. ICASSP. IEEE, 2023, pp. 1–2.
  13. “Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario,” pp. 274–278, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (16)
  1. He Wang (295 papers)
  2. Pengcheng Guo (55 papers)
  3. Yue Li (219 papers)
  4. Ao Zhang (45 papers)
  5. Jiayao Sun (9 papers)
  6. Lei Xie (339 papers)
  7. Wei Chen (1293 papers)
  8. Pan Zhou (221 papers)
  9. Hui Bu (25 papers)
  10. Xin Xu (188 papers)
  11. Binbin Zhang (47 papers)
  12. Zhuo Chen (319 papers)
  13. Jian Wu (315 papers)
  14. Longbiao Wang (46 papers)
  15. Eng Siong Chng (112 papers)
  16. Sun Li (5 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.