HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids
Abstract: This paper introduces HAAQI-Net, a non-intrusive deep learning-based music audio quality assessment model for hearing aid users. Unlike traditional methods like the Hearing Aid Audio Quality Index (HAAQI) that require intrusive reference signal comparisons, HAAQI-Net offers a more accessible and computationally efficient alternative. By utilizing a Bidirectional Long Short-Term Memory (BLSTM) architecture with attention mechanisms and features extracted from the pre-trained BEATs model, it can predict HAAQI scores directly from music audio clips and hearing loss patterns. Experimental results demonstrate HAAQI-Net's effectiveness, achieving a Linear Correlation Coefficient (LCC) of 0.9368 , a Spearman's Rank Correlation Coefficient (SRCC) of 0.9486 , and a Mean Squared Error (MSE) of 0.0064 and inference time significantly reduces from 62.52 to 2.54 seconds. To address computational overhead, a knowledge distillation strategy was applied, reducing parameters by 75.85% and inference time by 96.46%, while maintaining strong performance (LCC: 0.9071 , SRCC: 0.9307 , MSE: 0.0091 ). To expand its capabilities, HAAQI-Net was adapted to predict subjective human scores like the Mean Opinion Score (MOS) through fine-tuning. This adaptation significantly improved prediction accuracy, validated through statistical analysis. Furthermore, the robustness of HAAQI-Net was evaluated under varying Sound Pressure Level (SPL) conditions, revealing optimal performance at a reference SPL of 65 dB, with accuracy gradually decreasing as SPL deviated from this point. The advancements in subjective score prediction, SPL robustness, and computational efficiency position HAAQI-Net as a scalable solution for music audio quality assessment in hearing aid applications, contributing to efficient and accurate models in audio signal processing and hearing aid technology.
- “Music perception in adult cochlear implant recipients,” Acta otolaryngologica, vol. 123, no. 7, pp. 826–835, 2003.
- B. Edwards, “The Future of Hearing Aid Technology,” Trends in Amplification, vol. 11, no. 1, pp. 31–45, 2007.
- “Performance measurement in blind audio source separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1462–1469, 2006.
- “The Hearing-Aid Audio Quality Index (HAAQI),” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 2, pp. 354–365, 2015.
- P ITU, “800: Methods for subjective determination of transmission quality,” Recommendation ITU-T, 1996.
- ITU-T Recommendation, “Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs,” Rec. ITU-T P. 862, 2001.
- “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125–2136, 2011.
- “PEAQ-The ITU Standard for Objective Measurement of Perceived Audio Quality,” Journal of the Audio Engineering Society, vol. 48, no. 1/2, pp. 3–29, 2000.
- R. Huber and B. Kollmeier, “PEMO-Q—A New Method for Objective Audio Quality Assessment Using a Model of Auditory Perception,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 6, pp. 1902–1911, 2006.
- “Intrusive and Non-Intrusive Perceptual Speech Quality Assessment Using a Convolutional Neural Network,” in 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019, pp. 85–89.
- “Non-Reference Audio Quality Assessment for Online Live Music Recordings,” in Proceedings of the 21st ACM international conference on Multimedia, 2013, pp. 63–72.
- “Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model Based on BLSTM,” in Proc. Interspeech 2018, 2018, pp. 1873–1877.
- “NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets,” in Proc. Interspeech 2021, 2021, pp. 2127–2131.
- “Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 54–70, 2022.
- “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- “BEATs: Audio Pre-Training with Acoustic Tokenizers,” in Proceedings of the 40th International Conference on Machine Learning. 2023, ICML’23, JMLR.org.
- “FMA: A dataset for music analysis,” in International Society for Music Information Retrieval Conference, 2016.
- “The MTG-Jamendo Dataset for Automatic Music Tagging,” in Machine Learning for Music Discovery Workshop, International Conference on Machine Learning, Long Beach, CA, United States, 2019.
- “Music Through the Ages: Trends in Musical Engagement and Preferences from Adolescence Through Middle Adulthood,” Journal of Personality and Social Psychology, vol. 105, pp. 703–717, 2013.
- “Effects of noise, nonlinear processing, and linear filtering on perceived speech quality,” Ear and hearing, vol. 31, no. 3, pp. 420–436, 2010.
- “Classification of hearing loss,” Update On Hearing Loss, vol. 4, pp. 29–37, 2015.
- D. Byrne and H. Dillon, “The National Acoustic Laboratories’(NAL) New Procedure for Selecting the Gain and Frequency Response of a Hearing Aid,” Ear and hearing, vol. 7, no. 4, pp. 257–265, 1986.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.