Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
99 tokens/sec
Gemini 2.5 Pro Premium
56 tokens/sec
GPT-5 Medium
26 tokens/sec
GPT-5 High Premium
20 tokens/sec
GPT-4o
106 tokens/sec
DeepSeek R1 via Azure Premium
99 tokens/sec
GPT OSS 120B via Groq Premium
507 tokens/sec
Kimi K2 via Groq Premium
213 tokens/sec
2000 character limit reached

SALSA: Speedy ASR-LLM Synchronous Aggregation (2408.16542v1)

Published 29 Aug 2024 in cs.CL, cs.LG, cs.SD, and eess.AS

Abstract: Harnessing pre-trained LLMs to improve ASR systems, particularly for low-resource languages, is now an emerging area of research. Existing methods range from using LLMs for ASR error correction to tightly coupled systems that replace the ASR decoder with the LLM. These approaches either increase decoding time or require expensive training of the cross-attention layers. We propose SALSA, which couples the decoder layers of the ASR to the LLM decoder, while synchronously advancing both decoders. Such coupling is performed with a simple projection of the last decoder state, and is thus significantly more training efficient than earlier approaches. A challenge of our proposed coupling is handling the mismatch between the tokenizers of the LLM and ASR systems. We handle this mismatch using cascading tokenization with respect to the LLM and ASR vocabularies. We evaluate SALSA on 8 low-resource languages in the FLEURS benchmark, yielding substantial WER reductions of up to 38%.

Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube