Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MIRNet: Learning multiple identities representations in overlapped speech (2008.01698v2)

Published 4 Aug 2020 in eess.AS and cs.SD

Abstract: Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determine identity information when there are multiple concurrent speakers in a given signal. In this paper, we propose a novel deep speaker representation strategy that can reliably extract multiple speaker identities from an overlapped speech. We design a network that can extract a high-level embedding that contains information about each speaker's identity from a given mixture. Unlike conventional approaches that need reference acoustic features for training, our proposed algorithm only requires the speaker identity labels of the overlapped speech segments. We demonstrate the effectiveness and usefulness of our algorithm in a speaker verification task and a speech separation system conditioned on the target speaker embeddings obtained through the proposed method.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Hyewon Han (10 papers)
  2. Soo-Whan Chung (21 papers)
  3. Hong-Goo Kang (36 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.