Characteristics that make audio tokenizers suitable for native audio language models
Determine the specific architectural and representational characteristics that make a discrete audio tokenizer truly suitable as a native interface for autoregressive audio language models, beyond reconstruction fidelity and domain coverage, so that such tokenizers effectively support large-scale end-to-end audio language modeling.
References
Despite these advances, it remains unclear what characteristics make an audio tokenizer truly suitable for native audio LLMs.
— MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models
(2602.10934 - Gong et al., 11 Feb 2026) in Related Works, Subsection “Discrete Audio Tokenizers”