Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 56 tok/s
Gemini 2.5 Pro 38 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 182 tok/s Pro
GPT OSS 120B 420 tok/s Pro
Claude Sonnet 4.5 30 tok/s Pro
2000 character limit reached

Lightly Weighted Automatic Audio Parameter Extraction for the Quality Assessment of Consensus Auditory-Perceptual Evaluation of Voice (2311.15582v1)

Published 27 Nov 2023 in cs.SD, cs.LG, and eess.AS

Abstract: The Consensus Auditory-Perceptual Evaluation of Voice is a widely employed tool in clinical voice quality assessment that is significant for streaming communication among clinical professionals and benchmarking for the determination of further treatment. Currently, because the assessment relies on experienced clinicians, it tends to be inconsistent, and thus, difficult to standardize. To address this problem, we propose to leverage lightly weighted automatic audio parameter extraction, to increase the clinical relevance, reduce the complexity, and enhance the interpretability of voice quality assessment. The proposed method utilizes age, sex, and five audio parameters: jitter, absolute jitter, shimmer, harmonic-to-noise ratio (HNR), and zero crossing. A classical machine learning approach is employed. The result reveals that our approach performs similar to state-of-the-art (SOTA) methods, and outperforms the latent representation obtained by using popular audio pre-trained models. This approach provide insights into the feasibility of different feature extraction approaches for voice evaluation. Audio parameters such as jitter and the HNR are proven to be suitable for characterizing voice quality attributes, such as roughness and strain. Conversely, pre-trained models exhibit limitations in effectively addressing noise-related scorings. This study contributes toward more comprehensive and precise voice quality evaluations, achieved by a comprehensively exploring diverse assessment methodologies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. B. Barsties and M. De Bodt, “Assessment of voice quality: Current state-of-the-art,” Auris Nasus Larynx, vol. 42, no. 3, pp. 183–188, 2015.
  2. J. M. Miramont, M. A. Colominas, and G. Schlotthauer, “Emulating perceptual evaluation of voice using scattering transform based features,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1892–1901, 2022.
  3. F. Jalalinajafabadi, C. Gadepalli, F. Ascott, J. Homer, M. Luján, and B. Cheetham, “Perceptual evaluation of voice quality and its correlation with acoustic measurement,” in 2013 European Modelling Symposium, 2013, pp. 283–286.
  4. N. Saenz-Lechon, J. I. Godino-Llorente, V. Osma-Ruiz, M. Blanco-Velasco, and F. Cruz-Roldan, “Automatic assessment of voice quality according to the grbas scale,” in 2006 International Conference of the IEEE Engineering in Medicine and Biology Society, 2006, pp. 2478–2481.
  5. C. Gadepalli, F. Jalalinajafabadi, Z. Xie, B. M. Cheetham, and J. J. Homer, “Voice quality assessment by simulating grbas scoring,” in 2017 European Modelling Symposium (EMS), 2017, pp. 107–111.
  6. T. Villa-Cañas, J. R. Orozco-Arroyave, J. Arias-Londoño, J. F. Vargas-Bonilla, and J. I. Godino-Llorente, “Automatic assessment of voice signals according to the grbas scale using modulation spectra, mel frequency cepstral coefficients and noise parameters,” in Symposium of Signals, Images and Artificial Vision (STSIVA), 2013, pp. 1–5.
  7. A. Sasou, “Automatic identification of pathological voice quality based on the grbas categorization,” in 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2017, pp. 1243–1247.
  8. P. R. Walden, “Perceptual voice qualities database (PVQD): Database characteristics,” Journal of Voice, vol. 36, no. 6, pp. 875.e15–875.e23, 2022.
  9. K. Weiss, T. M. Khoshgoftaar, and D. Wang, “A survey of transfer learning,” Journal of Big data, vol. 3, no. 1, pp. 1–40, 2016.
  10. K. Nemr, M. es Zenari, G. F. Cordeiro, D. Tsuji, A. I. Ogawa, M. T. Ubrig, and M. H. Menezes, “GRBAS and Cape-V scales: high reliability and consensus when applied at different times,” J Voice, vol. 26, no. 6, pp. 17–22, Nov 2012.
  11. D. V. Borovikova and V. K. Makukha, “Comparative analysis of acoustic voice-quality parameters,” in 2015 16th International Conference of Young Specialists on Micro/Nanotechnologies and Electron Devices, 2015, pp. 569–571.
  12. Y. K. Singla, J. Shah, C. Chen, and R. R. Shah, “What do audio transformers hear? probing their representations for language delivery & structure,” in 2022 IEEE International Conference on Data Mining Workshops (ICDMW), 2022, pp. 910–925.
  13. A. Mallol-Ragolta, S. Liu, and B. Schuller, “Covid-19 detection exploiting self-supervised learning representations of respiratory sounds,” in 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), 2022, pp. 1–4.
  14. J. P. Teixeira, C. Oliveira, and C. Lopes, “Vocal acoustic analysis – jitter, shimmer and hnr parameters,” Procedia Technology, vol. 9, pp. 1112–1122, 2013.
  15. G. Li, Q. Hou, C. Zhang, Z. Jiang, and S. Gong, “Acoustic parameters for the evaluation of voice quality in patients with voice disorders,” Annals of Palliative Medicine, vol. 10, no. 1, 2020.
  16. A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech representations,” 2020.
  17. W.-N. Hsu, B. Bolte, Y.-H. H. Tsai, K. Lakhotia, R. Salakhutdinov, and A. Mohamed, “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” 2021.
  18. S. Chen, C. Wang, Z. Chen, Y. Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiao, J. Wu, L. Zhou, S. Ren, Y. Qian, Y. Qian, J. Wu, M. Zeng, X. Yu, and F. Wei, “WavLM: Large-scale self-supervised pre-training for full stack speech processing,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, oct 2022.
  19. A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” 2022.
  20. G. B. Kempster, B. R. Gerratt, K. V. Abbott, J. Barkmeier-Kraemer, and R. E. Hillman, “Consensus auditory-perceptual evaluation of voice: Development of a standardized clinical protocol,” American Journal of Speech-Language Pathology, vol. 18, no. 2, pp. 124–132, 2009.
  21. R. Krishnan, P. Rajpurkar, and E. J. Topol, “Self-supervised learning in medicine and healthcare,” Nature Biomedical Engineering, vol. 6, no. 12, pp. 1346–1352, 2022.
  22. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, and L. Bottou, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.” Journal of machine learning research, vol. 11, no. 12, 2010.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.