Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction (2306.09313v1)

Published 15 Jun 2023 in eess.AS, cs.AI, cs.CL, and cs.LG

Abstract: Speaker diarization (SD) is typically used with an automatic speech recognition (ASR) system to ascribe speaker labels to recognized words. The conventional approach reconciles outputs from independently optimized ASR and SD systems, where the SD system typically uses only acoustic information to identify the speakers in the audio stream. This approach can lead to speaker errors especially around speaker turns and regions of speaker overlap. In this paper, we propose a novel second-pass speaker error correction system using lexical information, leveraging the power of modern LLMs (LMs). Our experiments across multiple telephony datasets show that our approach is both effective and robust. Training and tuning only on the Fisher dataset, this error correction approach leads to relative word-level diarization error rate (WDER) reductions of 15-30% on three telephony datasets: RT03-CTS, Callhome American English and held-out portions of Fisher.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Rohit Paturi (9 papers)
  2. Sundararajan Srinivasan (16 papers)
  3. Xiang Li (1003 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.