Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction (2401.13249v2)

Published 24 Jan 2024 in eess.AS and cs.MM

Abstract: Automatic Mean Opinion Score (MOS) prediction is employed to evaluate the quality of synthetic speech. This study extends the application of predicted MOS to the task of Fake Audio Detection (FAD), as we expect that MOS can be used to assess how close synthesized speech is to the natural human voice. We propose MOS-FAD, where MOS can be leveraged at two key points in FAD: training data selection and model fusion. In training data selection, we demonstrate that MOS enables effective filtering of samples from unbalanced datasets. In the model fusion, our results demonstrate that incorporating MOS as a gating mechanism in FAD model fusion enhances overall performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Wangjin Zhou (6 papers)
  2. Zhengdong Yang (7 papers)
  3. Chenhui Chu (48 papers)
  4. Sheng Li (219 papers)
  5. Raj Dabre (65 papers)
  6. Yi Zhao (222 papers)
  7. Tatsuya Kawahara (61 papers)

Summary

We haven't generated a summary for this paper yet.