Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting (2308.16488v1)

Published 31 Aug 2023 in eess.AS and cs.SD

Abstract: Automatic Mean Opinion Score (MOS) prediction is crucial to evaluate the perceptual quality of the synthetic speech. While recent approaches using pre-trained self-supervised learning (SSL) models have shown promising results, they only partly address the data scarcity issue for the feature extractor. This leaves the data scarcity issue for the decoder unresolved and leading to suboptimal performance. To address this challenge, we propose a retrieval-augmented MOS prediction method, dubbed {\bf RAMP}, to enhance the decoder's ability against the data scarcity issue. A fusing network is also proposed to dynamically adjust the retrieval scope for each instance and the fusion weights based on the predictive confidence. Experimental results show that our proposed method outperforms the existing methods in multiple scenarios.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hui Wang (371 papers)
  2. Shiwan Zhao (48 papers)
  3. Xiguang Zheng (7 papers)
  4. Yong Qin (36 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.