Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users using Intermediate ASR Features and Human Memory Models (2401.13611v1)

Published 24 Jan 2024 in cs.SD, cs.AI, and eess.AS

Abstract: Neural networks have been successfully used for non-intrusive speech intelligibility prediction. Recently, the use of feature representations sourced from intermediate layers of pre-trained self-supervised and weakly-supervised models has been found to be particularly useful for this task. This work combines the use of Whisper ASR decoder layer representations as neural network input features with an exemplar-based, psychologically motivated model of human memory to predict human intelligibility ratings for hearing-aid users. Substantial performance improvement over an established intrusive HASPI baseline system is found, including on enhancement systems and listeners unseen in the training data, with a root mean squared error of 25.3 compared with the baseline of 28.7.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. World Health Organization, “Addressing the Rising Prevalence of Hearing Loss,” 2018, ISBN: 9789241550260.
  2. “Personalized Acoustic Interfaces for Human-Computer Interaction,” in Human-Centered Design of E-Health Technologies: Concepts, Methods and Applications, M. Ziefle and C.Röcker, Eds., chapter 8, pp. 180–207. IGI Global, 2011.
  3. World Health Organisation, “Ageing and Health,” https://www.who.int/news-room/fact-sheets/detail/ageing-and-health, Accesssed: 2023-07-26.
  4. “Multichannel Signal Enhancement Algorithms for Assisted Listening Devices: Exploiting spatial diversity using multiple microphones,” IEEE Signal Processing Magazine, vol. 32, no. 2, pp. 18–30, 2015.
  5. “Hands-Free Telecommunication for Elderly Persons Suffering from Hearing Deficiencies,” in IEEE Int. Conf. on E-Health Networking, Application and Services (Healthcom’10), 2010.
  6. “Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices: Advantages and Limitations of Existing Tools,” IEEE Signal Processing Magazine, vol. 32, no. 2, pp. 114–124, 2015.
  7. “Subjective Speech Quality and Speech Intelligibility Evaluation of Single-Channel Dereverberation Algorithms,” in Int. Workshop on Acoustic Signal Enhancement (IWAENC 2014), France, Sep. 2014.
  8. “Comparing Binaural Pre-processing Strategies III: Speech Intelligibility of Normal-Hearing and Hearing-Impaired Listeners,” Trends in Hearing, vol. 19, 2015.
  9. “Non-Intrusive Speech Quality Prediction Using Modulation Energies and LSTM-Network,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 7, pp. 1151–1163, July 2019.
  10. “The 2nd Clarity Prediction Challenge: A machine learning challenge for hearing aid intelligibility prediction,” in ICASSP, 2024.
  11. “The 1st Clarity Prediction Challenge: A machine learning challenge for hearing aid intelligibility prediction,” in Proc. Interspeech, 2022, pp. 3508–3512.
  12. “Unsupervised Uncertainty Measures of Automatic Speech Recognition for Non-intrusive Speech Intelligibility Prediction,” in Proc. Interspeech, 2022, pp. 3493–3497.
  13. “MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids,” 2022.
  14. “Non Intrusive Intelligibility Predictor for Hearing Impaired Individuals using Self Supervised Speech Representations,” in Proc. Workshop on Speech Foundation Models and their Performance Benchmarks (SPARKS), ASRU sattelite workshop, Taipei, Taiwan, 2023.
  15. “Non-intrusive Speech Intelligibility Metric Prediction for Hearing Impaired Individuals,” in Proc. Interspeech, 2022, pp. 3483–3487.
  16. “The Hearing-aid Speech Perception Index (HASPI) Version 2,” Speech Communication, vol. 131, pp. 35–46, 2021.
  17. “Robust Speech Recognition via Large-Scale Weak Supervision,” 2022.
  18. “Attention is All You Need,” Advances in neural information processing systems, vol. 30, 2017.
  19. “Pre-trained Speech Representations as Feature Extractors for Speech Quality Assessment in Online Conferencing Applications,” in Interspeech. Sep 2022, ISCA.
  20. D. Hintzman, “MINERVA 2: a Simulation Model of Human Memory,” Behaviour Research Methods, Instruments & Computers, vol. 16, pp. 96–101, 03 1984.
  21. “Rule-Plus-Exception Model of Classification Learning,” Psychological Review, vol. 101, no. 1, pp. 53–79, 1994.
  22. “Rules and Exemplars in Category Learning,” Journal of Experimental Psychology: General, vol. 127, 1998.
  23. J. N. Rouder and R. Ratcliff, “Comparing Exemplar- and Rule-Based Theories of Categorization,” Current Directions in Psychological Science, vol. 15, 2006.
  24. “Generalization of Feature- and Rule-based Learning in the Categorization of Dimensional Stimuli: Evidence for Dual Processes Under Cognitive Control,” J Exp Psychol Anim Behav Process, vol. 39, no. 2, pp. 140–51, 2013.
  25. “Effects of better-ear glimpsing, binaural unmasking, and spectral resolution on spatial release from masking in cochlear-implant users,” The Journal of the Acoustical Society of America, vol. 152, no. 2, pp. 1230–1246, 08 2022.
Citations (8)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets