Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge (2006.07898v1)

Published 14 Jun 2020 in eess.AS and cs.SD

Abstract: This paper summarizes the JHU team's efforts in tracks 1 and 2 of the CHiME-6 challenge for distant multi-microphone conversational speech diarization and recognition in everyday home environments. We explore multi-array processing techniques at each stage of the pipeline, such as multi-array guided source separation (GSS) for enhancement and acoustic model training data, posterior fusion for speech activity detection, PLDA score fusion for diarization, and lattice combination for automatic speech recognition (ASR). We also report results with different acoustic model architectures, and integrate other techniques such as online multi-channel weighted prediction error (WPE) dereverberation and variational Bayes-hidden Markov model (VB-HMM) based overlap assignment to deal with reverberation and overlapping speakers, respectively. As a result of these efforts, our ASR systems achieve a word error rate of 40.5% and 67.5% on tracks 1 and 2, respectively, on the evaluation set. This is an improvement of 10.8% and 10.4% absolute, over the challenge baselines for the respective tracks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Ashish Arora (20 papers)
  2. Desh Raj (32 papers)
  3. Aswin Shanmugam Subramanian (20 papers)
  4. Ke Li (723 papers)
  5. Bar Ben-Yair (2 papers)
  6. Matthew Maciejewski (9 papers)
  7. Shinji Watanabe (416 papers)
  8. Sanjeev Khudanpur (74 papers)
  9. Piotr Żelasko (36 papers)
  10. Paola García (6 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.