Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge (2312.16002v1)

Published 26 Dec 2023 in eess.AS and cs.AI

Abstract: This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition. Our submitted systems for ICMC-ASR Challenge include the multi-channel front-end enhancement and diarization, training data augmentation, speech recognition modeling with multi-channel branches. Tested on the offical Eval1 and Eval2 set, our best system achieves a relative 34.3% improvement in CER and 56.5% improvement in cpCER, compared to the offical baseline system.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (9)
  1. Y. Hu et al., “Dccrn: Deep complex convolution recurrent network for phase-aware speech enhancement,” INTERSPEECH, 2020.
  2. J. Yu et al., “High fidelity speech enhancement with band-split rnn,” arXiv preprint arXiv:2212.00406, 2022.
  3. D. Raj et al., “Gpu-accelerated guided source separation for meeting transcription,” arXiv preprint arXiv:2212.05271, 2022.
  4. “Independent vector analysis: Definition and algorithms,” in 2006 Fortieth Asilomar Conference on Signals, Systems and Computers. IEEE, 2006, pp. 1393–1396.
  5. Q. Kong et al., “Panns: Large-scale pretrained audio neural networks for audio pattern recognition,” IEEE/ACM TASLP, vol. 28, pp. 2880–2894, 2020.
  6. W. Hsu et al., “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” IEEE/ACM TASLP, vol. 29, pp. 3451–3460, 2021.
  7. H. Wang et al., “Wespeaker: A research and production oriented speaker embedding learning toolkit,” in Proc. ICASSP. IEEE, 2023, pp. 1–5.
  8. I. Medennikov et al., “Target-speaker voice activity detection: a novel approach for multi-speaker diarization in a dinner party scenario,” INTERSPEECH, 2020.
  9. H. Wang et al., “Cam++: A fast and efficient network for speaker verification using context-aware masking,” in Proc. INTERSPEECH, 2023.

Summary

We haven't generated a summary for this paper yet.