Effectiveness of Speech-LLMs for Users with Inherently Disfluent Speech

Determine whether users with inherently disfluent speech can interact effectively with Speech Large Language Models (Speech-LLMs).

Background

The paper argues that prior benchmarks emphasize linguistic competence or environmental noise but devote less attention to speech-internal phenomena such as phoneme structure, prosodic patterns, and overlapping speech. Because of this gap, it is uncertain whether individuals whose speech is inherently disfluent can effectively use Speech-LLMs.

To address this, the authors introduce VocalBench-DF, a benchmark designed to systematically evaluate Speech-LLMs under a wide range of disfluency conditions. The open question motivates the need for rigorous evaluation and methods that ensure accessibility for users with disfluent speech.

References

Yet, these efforts devote less attention to speech-internal phenomena, leaving open the question of whether users with inherently disfluent speech can interact effectively with Speech-LLMs.

VocalBench-DF: A Benchmark for Evaluating Speech LLM Robustness to Disfluency (2510.15406 - Liu et al., 17 Oct 2025) in Related Works, Section 2