LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models
Abstract: We propose LauraTSE, an Auto-Regressive Decoder-Only LLM for Target Speaker Extraction built upon the LauraGPT backbone. LauraTSE employs a small-scale auto-regressive decoder-only LLM that generates the initial layers of the target speech's discrete codec representations from the continuous embeddings of both the mixture and reference speech. These outputs serve as coarse-grained predictions. To refine them, a one-step encoder-only LLM reconstructs the full codec representation by integrating information from both the mixture and the reference speech, adding fine-grained details. Our approach achieves superior or comparable performance to existing TSE models. Additionally, we conduct ablation studies to investigate the data scalability and the contribution of the encoder-only model.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.