Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling (2209.15483v2)

Published 30 Sep 2022 in cs.CL, cs.LG, and eess.AS

Abstract: Generative Spoken LLMing research focuses on optimizing speech LLMs (LMs) using raw audio recordings without accessing any textual supervision. Such speech LMs usually operate over discrete units obtained from quantizing internal representations of self-supervised models. Although such units show impressive modeling results, their robustness capabilities have not been extensively investigated. This work focuses on improving the robustness of discrete input representations for generative spoken LLMing. First, we formally define how to measure the robustness of such representations to various signal variations that do not alter the spoken information (e.g., time-stretch). Next, we empirically demonstrate how current state-of-the-art representation models lack robustness to such variations. To overcome this, we propose an effective and efficient method to learn robust discrete speech representation for generative spoken LLMing. The proposed approach is based on applying a set of signal transformations to the speech signal and optimizing the model using an iterative pseudo-labeling scheme. Our method significantly improves over the evaluated baselines when considering encoding and modeling metrics. We additionally evaluate our method on the speech-to-speech translation task, considering Spanish-English and French-English translations, and show the proposed approach outperforms the evaluated baselines.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Itai Gat (30 papers)
  2. Felix Kreuk (22 papers)
  3. Tu Anh Nguyen (12 papers)
  4. Ann Lee (29 papers)
  5. Jade Copet (26 papers)
  6. Gabriel Synnaeve (97 papers)
  7. Emmanuel Dupoux (81 papers)
  8. Yossi Adi (96 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.