Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Study on the Correlation between Objective Evaluations and Subjective Speech Quality and Intelligibility (2307.04517v2)

Published 10 Jul 2023 in eess.AS

Abstract: Subjective tests are the gold standard for evaluating speech quality and intelligibility; however, they are time-consuming and expensive. Thus, objective measures that align with human perceptions are crucial. This study evaluates the correlation between commonly used objective measures and subjective speech quality and intelligibility using a Chinese speech dataset. Moreover, new objective measures are proposed that combine current objective measures using deep learning techniques to predict subjective quality and intelligibility. The proposed deep learning model reduces the amount of training data without significantly affecting prediction performance. We analyzed the deep learning model to understand how objective measures reflect subjective quality and intelligibility. We also explored the impact of including subjective speech quality ratings on speech intelligibility prediction. Our findings offer valuable insights into the relationship between objective measures and human perceptions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. P Recommendation, “862: Perceptual evaluation of speech quality (pesq): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs,” Feb, vol. 14, pp. 14–0, 2001.
  2. “Perceptual objective listening quality assessment (polqa), the third generation itu-t standard for end-to-end speech quality measurement part i—temporal alignment,” journal of the audio engineering society, vol. 61, no. 6, pp. 366–384, 2013.
  3. “A scalable noisy speech dataset and online subjective test framework,” arXiv preprint arXiv:1909.08050, 2019.
  4. “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125–2136, 2011.
  5. “An algorithm for predicting the intelligibility of speech masked by modulated noise maskers,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 11, pp. 2009–2022, 2016.
  6. “Predicting speech intelligibility using a gammachirp envelope distortion index based on the signal-to-distortion ratio.,” in INTERSPEECH, 2017, pp. 2949–2953.
  7. “Subjective intelligibility of deep neural network-based speech enhancement,” 2017.
  8. “P. 563—the itu-t standard for single-ended speech quality assessment,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 6, pp. 1924–1934, 2006.
  9. “Anique+: A new american national standard for non-intrusive estimation of narrowband speech quality,” Bell Labs Technical Journal, vol. 12, no. 1, pp. 221–236, 2007.
  10. “A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 7, pp. 1766–1774, 2010.
  11. “Quality-net: An end-to-end non-intrusive speech quality assessment model based on blstm,” arXiv preprint arXiv:1808.05344, 2018.
  12. “A deep learning-based time-domain approach for non-intrusive speech quality assessment,” in 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2020, pp. 477–481.
  13. “An attention enhanced multi-task model for objective speech assessment in real-world environments,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 911–915.
  14. “Dnsmos: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6493–6497.
  15. “Nisqa: A deep cnn-self-attention model for multidimensional speech quality prediction with crowdsourced datasets,” arXiv preprint arXiv:2104.09494, 2021.
  16. “Nonintrusive speech intelligibility prediction using convolutional neural networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 10, pp. 1925–1939, 2018.
  17. “A neural network for monaural intrusive speech intelligibility prediction,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 336–340.
  18. Yi Hu and Philipos C Loizou, “Evaluation of objective quality measures for speech enhancement,” IEEE Transactions on audio, speech, and language processing, vol. 16, no. 1, pp. 229–238, 2007.
  19. “Assessment of objective quality measures for speech intelligibility estimation,” in 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. IEEE, 2006, vol. 1, pp. I–I.
  20. “Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions,” The Journal of the Acoustical Society of America, vol. 125, no. 5, pp. 3387–3405, 2009.
  21. Yu-Wen Chen and Yu Tsao, “Inqss: a speech intelligibility assessment model using a multi-task learning network,” arXiv preprint arXiv:2111.02585, 2021.
  22. “Dnsmos p. 835: A non-intrusive perceptual objective speech quality metric to evaluate noise suppressors,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 886–890.
  23. “Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 54–70, 2022.
  24. “Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures,” Computer Speech & Language, vol. 46, pp. 574–584, 2017.
  25. “Using deep speech recognition to evaluate speech enhancement methods,” in 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020, pp. 1–7.
  26. “End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 9, pp. 1570–1584, 2018.
  27. “Generalization ability of mos prediction networks,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 8442–8446.
Citations (1)

Summary

We haven't generated a summary for this paper yet.