Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How to train your ears: Auditory-model emulation for large-dynamic-range inputs and mild-to-severe hearing losses (2403.10428v1)

Published 15 Mar 2024 in eess.AS

Abstract: Advanced auditory models are useful in designing signal-processing algorithms for hearing-loss compensation or speech enhancement. Such auditory models provide rich and detailed descriptions of the auditory pathway, and might allow for individualization of signal-processing strategies, based on physiological measurements. However, these auditory models are often computationally demanding, requiring significant time to compute. To address this issue, previous studies have explored the use of deep neural networks to emulate auditory models and reduce inference time. While these deep neural networks offer impressive efficiency gains in terms of computational time, they may suffer from uneven emulation performance as a function of auditory-model frequency-channels and input sound pressure level, making them unsuitable for many tasks. In this study, we demonstrate that the conventional machine-learning optimization objective used in existing state-of-the-art methods is the primary source of this limitation. Specifically, the optimization objective fails to account for the frequency- and level-dependencies of the auditory model, caused by a large input dynamic range and different types of hearing losses emulated by the auditory model. To overcome this limitation, we propose a new optimization objective that explicitly embeds the frequency- and level-dependencies of the auditory model. Our results show that this new optimization objective significantly improves the emulation performance of deep neural networks across relevant input sound levels and auditory-model frequency channels, without increasing the computational load during inference. Addressing these limitations is essential for advancing the application of auditory models in signal-processing tasks, ensuring their efficacy in diverse scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. C. Wen, “Biophysically-inspired end-to-end time-domain speech enhacement,” Speech In Noise Workshop, Split, Croatia, 2023.
  2. F. Drakopoulos and S. Verhulst, “A Neural-Network Framework for the Design of Individualised Hearing-Loss Compensation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, pp. 1–15, 7 2023. [Online]. Available: http://arxiv.org/abs/2207.07091
  3. A. Osses Vecchi, L. Varnet, L. H. Carney, T. Dau, I. C. Bruce, S. Verhulst, and P. Majdak, “A comparative study of eight human auditory models of monaural processing,” Acta Acustica, vol. 6, 2022.
  4. A. Nagathil, F. Göbel, A. Nelus, and I. C. Bruce, “Computationally efficient DNN-based approximation of an auditory model for applications in speech processing,” ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2021-June, pp. 301–305, 2021.
  5. P. Asbjørn Leer Bysted, J. Jensen, Z.-H. Tan, J. Østergaard, and L. Bramsløw, “A parameter-conditional neural network framework for modelling parameterized auditory models,” EUROREGIO BNAM, Joint Acoustics Conference, 2022.
  6. D. Baby, A. V. D. Broucke, and S. Verhulst, “A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applications,” Nature machine intelligence, vol. 3, no. 2, p. 134, 2 2021. [Online]. Available: /pmc/articles/PMC7116797//pmc/articles/PMC7116797/?report=abstracthttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC7116797/
  7. A. Nagathil and I. C. Bruce, “WaveNet-based approximation of a cochlear filtering and hair cell transduction model,” The Journal of the Acoustical Society of America, vol. 154, no. 1, pp. 191–202, 7 2023. [Online]. Available: /asa/jasa/article/154/1/191/2902087/WaveNet-based-approximation-of-a-cochlear
  8. D. Byrne, H. Dillon, K. Tran, S. Arlinger, K. Wilbraham, R. Cox, B. Hagerman, R. Hetu, J. Kei, C. Lui, J. Kiessling, M. N. Kotby, N. H. A. Nasser, W. A. H. El Kholy, Y. Nakanishi, H. Oyer, R. Powell, D. Stephens, R. Meredith, T. Sirimanna, G. Tavartkiladze, G. I. Frolenkov, S. Westerman, and C. Ludvigsen, “An international comparison of long‐term average speech spectra,” The Journal of the Acoustical Society of America, vol. 96, no. 4, pp. 2108–2120, 10 1994. [Online]. Available: https://www.researchgate.net/publication/243782920_An_international_comparison_of_long-term_average_speech_spectra
  9. A. Altoè, V. Pulkki, and S. Verhulst, “Model-based estimation of the frequency tuning of the inner-hair-cell stereocilia from neural tuning curves,” The Journal of the Acoustical Society of America, vol. 141, no. 6, pp. 4438–4451, 6 2017. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/28679269/
  10. M. S. A. Zilany, I. C. Bruce, P. C. Nelson, and L. H. Carney, “A phenomenological model of the synapse between the inner hair cell and auditory nerve: Long-term adaptation with power-law dynamics,” The Journal of the Acoustical Society of America, vol. 126, no. 5, pp. 2390–2412, 11 2009. [Online]. Available: https://www.researchgate.net/publication/38072230_A_phenomenological_model_of_the_synapse_between_the_inner_hair_cell_and_auditory_nerve_Long-term_adaptation_with_power-law_dynamics
  11. N. Bisgaard, M. S. Vlaming, and M. Dahlquist, “Standard Audiograms for the IEC 60118-15 Measurement Procedure,” Trends in Amplification, vol. 14, no. 2, pp. 113–120, 2010. [Online]. Available: http://tia.sagepub.com
  12. M. S. A. Zilany, I. C. Bruce, and L. H. Carney, “Updated parameters and expanded simulation options for a model of the auditory periphery,” The Journal of the Acoustical Society of America, 2014.
  13. F. Drakopoulos, D. Baby, and S. Verhulst, “A convolutional neural-network framework for modelling auditory sensory cells and synapses,” Communications biology, vol. 4, no. 1, 12 2021. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/34211095/
  14. S. Verhulst, A. Altoè, and V. Vasilkov, “Computational modeling of the human auditory periphery: Auditory-nerve responses, evoked potentials and hearing loss,” Hearing Research, vol. 360, pp. 55–75, 2018.
  15. M. S. A. Zilany and I. C. Bruce, “Modeling auditory-nerve responses for high sound pressure levels in the normal and impaired auditory periphery,” The Journal of the Acoustical Society of America, vol. 120, no. 3, pp. 1446–1466, 9 2006.
  16. “GitHub - HearingTechnology/CoNNear_cochlea.” [Online]. Available: https://github.com/HearingTechnology/CoNNear_cochlea
  17. “Auditory Models - Publications - Carney Lab - University of Rochester Medical Center.” [Online]. Available: https://www.urmc.rochester.edu/labs/carney/publications-code/auditory-models.aspx
  18. D. Stoller, S. Ewert, and S. Dixon, “Wave-U-Net: A multi-scale neural network for end-to-end audio source separation,” Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, pp. 334–340, 2018.
  19. R. J. Baker and S. Rosen, “Auditory filter nonlinearity across frequency using simultaneous notched-noise masking,” The Journal of the Acoustical Society of America, vol. 119, no. 1, pp. 454–462, 1 2006.
  20. A. Araujo, W. Norris, and J. Sim, “Computing Receptive Fields of Convolutional Neural Networks,” Distill, vol. 4, no. 11, p. e21, 11 2019. [Online]. Available: https://distill.pub/2019/computing-receptive-fields
  21. H. Zen, V. Dang, R. Clark, Y. Zhang, R. J. Weiss, Y. Jia, Z. Chen, and Y. Wu, “LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech,” Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2019-September, pp. 1526–1530, 4 2019. [Online]. Available: https://arxiv.org/abs/1904.02882v1
  22. D. P. Kingma and J. L. Ba, “Adam: A Method for Stochastic Optimization,” 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 12 2014. [Online]. Available: https://arxiv.org/abs/1412.6980v9
  23. “GitHub - HearingTechnology/Verhulstetal2018Model: The model code for the Verhulst, Altoè, Vasilkov 2018 Hearing Research publication.” [Online]. Available: https://github.com/HearingTechnology/Verhulstetal2018Model
  24. K. Kinetic, “Ember - YouTube.” [Online]. Available: https://www.youtube.com/watch?v=pjNiF0PI8bQ
  25. Tchaikovsky, “Tchaikovsky Piano Concerto No 1 / Cliburn / Kondrashin (JMXR24004) 1958/2003 - YouTube.” [Online]. Available: https://www.youtube.com/watch?v=xZYYqUssAVw
  26. K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1 1989.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com