Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Speaker Diarization with Lexical Information (2004.06756v1)

Published 13 Apr 2020 in eess.AS, cs.CL, and cs.SD

Abstract: This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy. To integrate lexical and acoustic information in a comprehensive way during clustering, we introduce an adjacency matrix integration for spectral clustering. Since words and word boundary information for word-level speaker turn probability estimation are provided by a speech recognition system, our proposed method works without any human intervention for manual transcriptions. We show that the proposed method improves diarization performance on various evaluation datasets compared to the baseline diarization system using acoustic information only in speaker embeddings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Tae Jin Park (14 papers)
  2. Kyu J. Han (17 papers)
  3. Jing Huang (140 papers)
  4. Xiaodong He (162 papers)
  5. Bowen Zhou (141 papers)
  6. Panayiotis Georgiou (32 papers)
  7. Shrikanth Narayanan (151 papers)
Citations (30)