Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese (2505.11200v1)

Published 16 May 2025 in cs.SD, cs.AI, cs.CL, cs.HC, cs.LG, and eess.AS

Abstract: Recent advances in LLMs have significantly improved text-to-speech (TTS) systems, enhancing control over speech style, naturalness, and emotional expression, which brings TTS Systems closer to human-level performance. Although the Mean Opinion Score (MOS) remains the standard for TTS System evaluation, it suffers from subjectivity, environmental inconsistencies, and limited interpretability. Existing evaluation datasets also lack a multi-dimensional design, often neglecting factors such as speaking styles, context diversity, and trap utterances, which is particularly evident in Chinese TTS evaluation. To address these challenges, we introduce the Audio Turing Test (ATT), a multi-dimensional Chinese corpus dataset ATT-Corpus paired with a simple, Turing-Test-inspired evaluation protocol. Instead of relying on complex MOS scales or direct model comparisons, ATT asks evaluators to judge whether a voice sounds human. This simplification reduces rating bias and improves evaluation robustness. To further support rapid model development, we also finetune Qwen2-Audio-Instruct with human judgment data as Auto-ATT for automatic evaluation. Experimental results show that ATT effectively differentiates models across specific capability dimensions using its multi-dimensional design. Auto-ATT also demonstrates strong alignment with human evaluations, confirming its value as a fast and reliable assessment tool. The white-box ATT-Corpus and Auto-ATT can be found in ATT Hugging Face Collection (https://huggingface.co/collections/meituan/audio-turing-test-682446320368164faeaf38a4).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Xihuai Wang (11 papers)
  2. Ziyi Zhao (25 papers)
  3. Siyu Ren (24 papers)
  4. Shao Zhang (18 papers)
  5. Song Li (95 papers)
  6. Xiaoyu Li (348 papers)
  7. Ziwen Wang (37 papers)
  8. Lin Qiu (47 papers)
  9. Guanglu Wan (24 papers)
  10. Xuezhi Cao (24 papers)
  11. Xunliang Cai (63 papers)
  12. Weinan Zhang (322 papers)

Summary

We haven't generated a summary for this paper yet.