StyleBench: Evaluating Speech Language Models on Conversational Speaking Style Control

Published 8 Mar 2026 in cs.CL | (2603.07599v1)

Abstract: Speech LLMs (SLMs) have significantly extended the interactive capability of text-based LLMs by incorporating paralinguistic information. For more realistic interactive experience with customized styles, current SLMs have managed to interpret and control speaking style intensity from user prompts during the dialogue process. However, there remains a lack of systematic benchmarks that quantifies and evaluates the style intensity control ability in conversations. In this paper, we propose StyleBench, a multi-turn dialogue benchmark for comprehensively evaluating the style intensity control ability across four dimensions: emotion, speed, volume, and pitch. Our results reveal the performance gaps between leading SLMs and omni LLMs (OLMs), suggesting the underlying reasons and promising approaches for future exploration.