Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System (2310.12378v1)

Published 18 Oct 2023 in eess.AS and cs.SD

Abstract: We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays. The system predominantly comprises of the following integral modules: the Speaker Diarization Module, Multi-channel Audio Front-End Processing Module, and the ASR Module. These components collectively establish a cascading system, meticulously processing multi-channel and multi-speaker audio input. Moreover, this paper highlights the comprehensive optimization process that significantly enhanced our system's performance. Our team's submission is largely based on NeMo toolkits and will be publicly available.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Tae Jin Park (14 papers)
  2. He Huang (97 papers)
  3. Ante Jukic (83 papers)
  4. Kunal Dhawan (22 papers)
  5. Krishna C. Puvvada (28 papers)
  6. Nithin Koluguri (4 papers)
  7. Nikolay Karpov (10 papers)
  8. Aleksandr Laptev (14 papers)
  9. Jagadeesh Balam (39 papers)
  10. Boris Ginsburg (111 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.