Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings (2211.00511v3)

Published 1 Nov 2022 in eess.AS and cs.SD

Abstract: Speaker-attributed automatic speech recognition (SA-ASR) in multi-party meeting scenarios is one of the most valuable and challenging ASR task. It was shown that single-channel frame-level diarization with serialized output training (SC-FD-SOT), single-channel word-level diarization with SOT (SC-WD-SOT) and joint training of single-channel target-speaker separation and ASR (SC-TS-ASR) can be exploited to partially solve this problem. In this paper, we propose three corresponding multichannel (MC) SA-ASR approaches, namely MC-FD-SOT, MC-WD-SOT and MC-TS-ASR. For different tasks/models, different multichannel data fusion strategies are considered, including channel-level cross-channel attention for MC-FD-SOT, frame-level cross-channel attention for MC-WD-SOT and neural beamforming for MC-TS-ASR. Results on the AliMeeting corpus reveal that our proposed models can consistently outperform the corresponding single-channel counterparts in terms of the speaker-dependent character error rate.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Mohan Shi (9 papers)
  2. Jie Zhang (847 papers)
  3. Zhihao Du (30 papers)
  4. Fan Yu (63 papers)
  5. Qian Chen (264 papers)
  6. Shiliang Zhang (132 papers)
  7. Li-Rong Dai (26 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.