Papers
Topics
Authors
Recent
2000 character limit reached

Cross-Modal Knowledge Distillation for Speech Large Language Models (2509.14930v1)

Published 18 Sep 2025 in cs.CL and cs.AI

Abstract: In this work, we present the first systematic evaluation of catastrophic forgetting and modality inequivalence in speech LLMs, showing that introducing speech capabilities can degrade knowledge and reasoning even when inputs remain textual, and performance further decreases with spoken queries. To address these challenges, we propose a cross-modal knowledge distillation framework that leverages both text-to-text and speech-to-text channels to transfer knowledge from a text-based teacher model to a speech LLM. Extensive experiments on dialogue and audio understanding tasks validate the effectiveness of our approach in preserving textual knowledge, improving cross-modal alignment, and enhancing reasoning in speech-based interactions.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.