Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals using Self Supervised Speech Representations (2307.13423v3)

Published 25 Jul 2023 in cs.SD, cs.LG, and eess.AS

Abstract: Self-supervised speech representations (SSSRs) have been successfully applied to a number of speech-processing tasks, e.g. as feature extractor for speech quality (SQ) prediction, which is, in turn, relevant for assessment and training speech enhancement systems for users with normal or impaired hearing. However, exact knowledge of why and how quality-related information is encoded well in such representations remains poorly understood. In this work, techniques for non-intrusive prediction of SQ ratings are extended to the prediction of intelligibility for hearing-impaired users. It is found that self-supervised representations are useful as input features to non-intrusive prediction models, achieving competitive performance to more complex systems. A detailed analysis of the performance depending on Clarity Prediction Challenge 1 listeners and enhancement systems indicates that more data might be needed to allow generalisation to unknown systems and (hearing-impaired) individuals

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Neil Park, “Population estimates for the UK, England and Wales, Scotland and Northern Ireland, provisional: mid-2019,” Hampshire: Office for National Statistics, 2020.
  2. “Comparing Binaural Pre-processing Strategies III: Speech Intelligibility of Normal-Hearing and Hearing-Impaired Listeners,” Trends in Hearing, vol. 19, 2015.
  3. “Hands-free telecommunication for elderly persons suffering from hearing deficiencies,” in IEEE Int. Conf. on E-Health Networking, Application and Services (Healthcom’10), 2010.
  4. “Icassp 2023 deep noise suppression challenge,” 2023.
  5. “Metricgan-u: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech,” 2021.
  6. “MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data,” in EUSIPCO 2022, Belgrade, Serbia, Aug. 2022.
  7. “Clarity: Machine learning challenges to revolutionise hearing device processing,” 2020.
  8. “Clarity-2021 Challenges: Machine Learning Challenges for Advancing Hearing Aid Processing,” in Proc. Interspeech 2021, 2021, pp. 686–690.
  9. “wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, Eds. 2020, vol. 33, pp. 12449–12460, Curran Associates, Inc.
  10. “Hubert: Self-supervised speech representation learning by masked prediction of hidden units,” 2021.
  11. “SUPERB: Speech Processing Universal PERformance Benchmark,” in Proc. Interspeech 2021, 2021, pp. 1194–1198.
  12. “Perceive and predict: self-supervised speech representation based loss functions for speech enhancement,” in Proc. ICASSP 2023, 2023.
  13. “Pre-trained speech representations as feature extractors for speech quality assessment in online conferencing applications,” in Interspeech 2022. Sep 2022, ISCA.
  14. “Improving perceptual quality by phone-fortified perceptual loss using wasserstein distance for speech enhancement,” 2020.
  15. “Analysing discrete self supervised speech representation for spoken language modeling,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5.
  16. “Comparative layer-wise analysis of self-supervised speech models,” 2023.
  17. “Att-TasNet: Attending to Encodings in Time-Domain Audio Speech Separation of Noisy, Reverberant Speech Mixtures,” Frontiers in Signal Processing, vol. 2, 2022.
  18. “Attention is all you need,” in Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., 2017, vol. 30.
  19. “XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale,” in Proc. Interspeech 2022, 2022, pp. 2278–2282.
  20. “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. of ACL 2019, Minneapolis, Minnesota, June 2019, pp. 4171–4186, Association for Computational Linguistics.
  21. “Librispeech: An asr corpus based on public domain audio books,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206–5210.
  22. “NISQA: A deep CNN-self-attention model for multidimensional speech quality prediction with crowdsourced datasets,” in Interspeech 2021. aug 2021, ISCA.
  23. “Generalization ability of mos prediction networks,” 2021.
  24. “Exploring the influence of fine-tuning data on wav2vec 2.0 model for blind speech quality prediction,” in Proc. Interspeech 2022, 2022, pp. 4088–4092.
  25. “MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids,” in Proc. Interspeech 2022, 2022, pp. 3944–3948.
  26. “The 1st Clarity Prediction Challenge: A machine learning challenge for hearing aid intelligibility prediction,” in Proc. Interspeech 2022, 2022, pp. 3508–3512.
  27. Michael Anthony Stone and Brian C. J. Moore, “Tolerable hearing aid delays. i. estimation of limits imposed by the auditory path alone using simulated hearing losses.,” Ear and hearing, vol. 20 3, pp. 182–92, 1999.
  28. “The multilingual matrix test: Principles, applications, and comparison across languages: A review,” International Journal of Audiology, vol. 54, no. sup2, pp. 3–16, 2015.
  29. Patrick M Zurek and GA Studebaker, “Binaural advantages and directional effects in speech intelligibility,” Acoustical factors affecting hearing aid performance, vol. 2, pp. 255–275, 1993.
  30. “Non-intrusive binaural speech intelligibility prediction from discrete latent representations,” IEEE Signal Processing Letters, vol. 29, pp. 987–991, 2022.
  31. “Non-intrusive Speech Intelligibility Metric Prediction for Hearing Impaired Individuals,” in Proc. Interspeech 2022, 2022, pp. 3483–3487.
  32. “Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction,” in Proc. Interspeech 2022, 2022, pp. 3493–3497.
Citations (3)

Summary

We haven't generated a summary for this paper yet.