Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Qwen vs. Gemma Integration with Whisper: A Comparative Study in Multilingual SpeechLLM Systems (2506.13596v1)

Published 16 Jun 2025 in cs.CL, cs.SD, and eess.AS

Abstract: This paper presents our system for the MLC-SLM Challenge 2025, focusing on multilingual speech recognition and LLMing with LLMs. Our approach combines a fine-tuned Whisper-large-v3 encoder with efficient projector architectures and various decoder configurations. We employ a three-stage training methodology that progressively optimizes the encoder, projector, and LLM components. Our system achieves competitive performance with a private test average WER/CER result of 16.63% using the Gemma3-12B and 18.6% using the Qwen2.5-7B as decoder-only LLM.

Summary

We haven't generated a summary for this paper yet.