Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Environmental Preference Based Speech Enhancement For Individualised Multi-Modal Hearing Aids (2402.16757v1)

Published 26 Feb 2024 in cs.SD and eess.AS

Abstract: Since the advent of Deep Learning (DL), Speech Enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear ambient sound which may be of importance. Hearing Aid (HA) users may wish to customise their SE systems to suit their personal preferences and day-to-day lifestyle. In this paper, we introduce a preference learning based SE (PLSE) model for future multi-modal HAs that can contextually exploit audio information to improve listening comfort, based upon the preferences of the user. The proposed system estimates the Signal-to-noise ratio (SNR) as a basic objective speech quality measure which quantifies the relative amount of background noise present in speech, and directly correlates to the intelligibility of the signal. Additionally, to provide contextual information we predict the acoustic scene in which the user is situated. These tasks are achieved via a multi-task DL model, which surpasses the performance of inferring the acoustic scene or SNR separately, by jointly leveraging a shared encoded feature space. These environmental inferences are exploited in a preference elicitation framework, which linearly learns a set of predictive functions to determine the target SNR of an AV (Audio-Visual) SE system. By greatly reducing noise in challenging listening conditions, and by novelly scaling the output of the SE model, we are able to provide HA users with contextually individualised SE. Preliminary results suggest an improvement over the non-individualised baseline model in some participants.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Integrating form and meaning: A multi-task learning model for acoustic word embeddings. In Interspeech.
  2. Abeßer, J. (2020). A review of deep learning based methods for acoustic scene classification. Applied Sciences, 10(6).
  3. The conversation: Deep audio-visual speech enhancement. arXiv preprint arXiv:1804.04121.
  4. Creating clarity in noisy environments by using deep learning in hearing aids. Seminars in Hearing, 42:260–281.
  5. The third ‘chime’speech separation and recognition challenge: Dataset, task and baselines. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pages 504–511. IEEE.
  6. Smartphone based real-time super gaussian single microphone speech enhancement to improve intelligibility for hearing aid users using formant information. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 5503–5506. IEEE.
  7. Perceptual effects of noise reduction with respect to personal preference, speech intelligibility, and listening effort. Ear and hearing, 34.
  8. Evaluation of a fast method to measure high-frequency audiometry based on bayesian learning. Trends in Hearing, 28:23312165231225545.
  9. Individually tailored spectral-change enhancement for the hearing impaired. The Journal of the Acoustical Society of America, 143:1128–1137.
  10. Audio-visual speech enhancement and separation by utilizing multi-modal self-supervised embeddings. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), pages 1–5. IEEE.
  11. An audio-visual corpus for speech perception and automatic speech recognition. The Journal of the Acoustical Society of America, 120(5):2421–2424.
  12. A Differentiable Optimisation Framework for The Design of Individualised DNN-based Hearing-Aid Strategies. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 351–355, Singapore.
  13. Semantically-informed deep neural networks for sound recognition. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5.
  14. Nonintrusive objective measurement of speech intelligibility: A review of methodology. Biomedical Signal Processing and Control, 71:103204.
  15. Amplitude compression for preventing rollover at above-conversational speech levels. Trends in Hearing, 28:23312165231224597.
  16. Towards intelligibility-oriented audio-visual speech enhancement. In The Clarity Workshop on Machine Learning Challenges for Hearing Aids (Clarity-2021).
  17. Individual real-ear occluded measurements as a predictor of speech intelligibility benefit from noise reduction in hearing aids.
  18. An overview of the haspi and hasqi metrics for predicting speech intelligibility and speech quality for normal hearing, hearing loss, and hearing aids. Hearing Research, 426:108608.
  19. The nal-nl2 prescription procedure. Audiology research, 1(1):e24.
  20. Towards individualised speech enhancement: An snr preference learning system for multi-modal hearing aids. pages 1–5.
  21. Relation between hearing abilities and preferred playback settings for speech perception in complex listening conditions. International Journal of Audiology, 61(11):965–974.
  22. Lesica, N. A. (2018). Why do hearing aids fail to restore normal auditory perception? Trends in Neurosciences, 41(4):174–185.
  23. MOSNet: Deep Learning-Based Objective Assessment for Voice Conversion. In Proc. Interspeech 2019, pages 1541–1545.
  24. Low-complexity acoustic scene classification in dcase 2022 challenge. arXiv preprint arXiv:2206.03835.
  25. Mermelstein, P. (1979). Evaluation of a segmental SNR measure as an indicator of the quality of ADPCM coded speech. The Journal of the Acoustical Society of America, 66(6):1664–1667.
  26. An overview of deep-learning-based audio-visual speech enhancement and separation. IEEE/ACM Trans. Audio, Speech and Lang. Proc., 29:1368–1396.
  27. Deep learning techniques for noise annoyance detection: Results from an intensive workshop at the alan turing institute. The Journal of the Acoustical Society of America, 153(3_supplement):A262–A262.
  28. Guidelines for diagnosing and quantifying noise-induced hearing loss. Trends in Hearing, 26. PMID: 35469496.
  29. Investigating differences in preferred noise reduction strength among hearing aid users. Trends in Hearing, 20.
  30. Perception-based personalization of hearing aids using gaussian processes and active learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23.
  31. Hearing aid personalization. In 27th Annual Conference on Neural Information Processing Systems (NIPS 2013).
  32. Panariello, M. (2022). Low-complexity neural networks for robust acoustic scene classification in wearable audio devices. PhD thesis, Politecnico di Torino.
  33. Individual listener preference for strength of single-microphone noise-reduction; trade-off between noise tolerance and signal distortion tolerance. Trends in Hearing, 27:23312165231192304.
  34. Location-invariant representations for acoustic scene classification. In 2022 30th European Signal Processing Conference (EUSIPCO), pages 394–398.
  35. A robust audio-visual speech enhancement model. In ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 7529–7533. IEEE.
  36. Language modelling as a multi-task problem. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2049–2060, Online. Association for Computational Linguistics.
  37. Quality-net: An end-to-end non-intrusive speech quality assessment model based on blstm. In Proc. Interspeech 2018, pages 1873–1877.
  38. Mbi-net: A non-intrusive multi-branched speech intelligibility prediction model for hearing aids. In Interspeech.
  39. Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:54–70.
  40. Stoi-net: A deep learning based non-intrusive speech intelligibility assessment model. In 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 482–486. IEEE.
  41. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering, 34(12):5586–5609.
  42. Phone-aware multi-task learning and length expanding for short-duration language recognition. In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 433–437.
  43. Sixty years of frequency-domain monaural speech enhancement: From traditional to deep learning methods. Trends in Hearing, 27:23312165231209913. PMID: 37956661.

Summary

We haven't generated a summary for this paper yet.