Papers
Topics
Authors
Recent
Search
2000 character limit reached

LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models

Published 10 Apr 2025 in cs.LG and cs.AI | (2504.07402v2)

Abstract: We propose LauraTSE, an Auto-Regressive Decoder-Only LLM for Target Speaker Extraction built upon the LauraGPT backbone. LauraTSE employs a small-scale auto-regressive decoder-only LLM that generates the initial layers of the target speech's discrete codec representations from the continuous embeddings of both the mixture and reference speech. These outputs serve as coarse-grained predictions. To refine them, a one-step encoder-only LLM reconstructs the full codec representation by integrating information from both the mixture and the reference speech, adding fine-grained details. Our approach achieves superior or comparable performance to existing TSE models. Additionally, we conduct ablation studies to investigate the data scalability and the contribution of the encoder-only model.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.