Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
90 tokens/sec
Gemini 2.5 Pro Premium
54 tokens/sec
GPT-5 Medium
19 tokens/sec
GPT-5 High Premium
18 tokens/sec
GPT-4o
104 tokens/sec
DeepSeek R1 via Azure Premium
78 tokens/sec
GPT OSS 120B via Groq Premium
475 tokens/sec
Kimi K2 via Groq Premium
225 tokens/sec
2000 character limit reached

STAB: Speech Tokenizer Assessment Benchmark (2409.02384v1)

Published 4 Sep 2024 in cs.CL, cs.SD, and eess.AS

Abstract: Representing speech as discrete tokens provides a framework for transforming speech into a format that closely resembles text, thus enabling the use of speech as an input to the widely successful LLMs. Currently, while several speech tokenizers have been proposed, there is ambiguity regarding the properties that are desired from a tokenizer for specific downstream tasks and its overall generalizability. Evaluating the performance of tokenizers across different downstream tasks is a computationally intensive effort that poses challenges for scalability. To circumvent this requirement, we present STAB (Speech Tokenizer Assessment Benchmark), a systematic evaluation framework designed to assess speech tokenizers comprehensively and shed light on their inherent characteristics. This framework provides a deeper understanding of the underlying mechanisms of speech tokenization, thereby offering a valuable resource for expediting the advancement of future tokenizer models and enabling comparative analysis using a standardized benchmark. We evaluate the STAB metrics and correlate this with downstream task performance across a range of speech tasks and tokenizer choices.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.