Modeling of Speech-dependent Own Voice Transfer Characteristics for Hearables with In-ear Microphones (2310.06554v2)
Abstract: Many hearables contain an in-ear microphone, which may be used to capture the own voice of its user. However, due to the hearable occluding the ear canal, the in-ear microphone mostly records body-conducted speech, typically suffering from band-limitation effects and amplification at low frequencies. Since the occlusion effect is determined by the ratio between the air-conducted and body-conducted components of own voice, the own voice transfer characteristics between the outer face of the hearable and the in-ear microphone depend on the speech content and the individual talker. In this paper, we propose a speech-dependent model of the own voice transfer characteristics based on phoneme recognition, assuming a linear time-invariant relative transfer function for each phoneme. We consider both individual models as well as models averaged over several talkers. Experimental results based on recordings with a prototype hearable show that the proposed speech-dependent model enables to simulate in-ear signals more accurately than a speech-independent model in terms of technical measures, especially under utterance mismatch and talker mismatch. Additionally, simulation results show that talker-averaged models generalize better to different talkers than individual models.
- Rachel E. Bouserhal, Antoine Bernier and Jérémie Voix “An In-Ear Speech Database in Varying Conditions of the Audio-Phonation Loop” In J. Acoust. Soc. Am. 145.2, 2019, pp. 1069–1077 DOI: 10.1121/1.5091777
- Mie Østergaard Hansen “Occlusion Effects Part I and II”, 1998
- “A Model of the Occlusion Effect with Bone-Conducted Stimulation” In International Journal of Audiology 46.10 Taylor & Francis, 2007, pp. 595–608 DOI: 10.1080/14992020701545880
- “Individualized Prediction of the Sound Pressure at the Eardrum for an Earpiece with Integrated Receivers and Microphones” In J. Acoust. Soc. Am. 145.2, 2019, pp. 917–930 DOI: 10.1121/1.5089219
- “Hearing One’s Own Voice during Phoneme Vocalization - Transmission by Air and Bone Conduction” In J. Acoust. Soc. Am. 128.2, 2010, pp. 751–762 DOI: 10.1121/1.3458855
- “Towards a Practical Methodology for Assessment of the Objective Occlusion Effect Induced by Earplugs” In J. Acoust. Soc. Am. 151.6, 2022, pp. 4086–4100 DOI: 10.1121/10.0011696
- “Investigations on the Physical Factors Influencing the Ear Canal Occlusion Effect Caused by Hearing Aids” In Acta Acustica united with Acustica 100.3, 2014, pp. 527–536 DOI: 10.3813/AAA.918732
- J. Richard, V. Zimpfer and S. Roth “Effect of Bone Conduction Microphone Location and Mouth Opening on Transfer Function between Oral Cavity Sound Pressure and Skin Acceleration” In Proc. Convention of the European Acoustics Association (Forum Acusticum), 2023
- Christoph Pörschmann “Influences of Bone Conduction and Air Conduction on the Sound of One’s Own Voice” In Acta Acustica united with Acustica 86.6, 2000, pp. 1038–1045
- “Three-Dimensional Finite Element Modeling of the Human External Ear: Simulation Study of the Bone Conduction Occlusion Effecta)” In J. Acoust. Soc. Am. 135.3, 2014, pp. 1433–1444 DOI: 10.1121/1.4864484
- “Signal Processing Challenges for Active Noise Cancellation Headphones” In Proc. ITG Conference on Speech Communication, 2018, pp. 11–15
- “Optimization of a Fixed Virtual Sensing Feedback ANC Controller For In-Ear Headphones with Multiple Loudspeakers” In Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 8717–8721 DOI: 10.1109/ICASSP43922.2022.9746327
- Thomas Zurbrügg “The Occlusion Effect - Measurements, Simulations and Countermeasures” In Proc. ITG Conference on Speech Communication, 2018, pp. 26–30
- “Occlusion Effect Cancellation in Headphones and Hearing Devices—The Sister of Active Noise Cancellation” In IEEE/ACM Trans. on Audio, Speech, and Language Processing 30, 2022, pp. 35–48 DOI: 10.1109/TASLP.2021.3130966
- Rachel E. Bouserhal, Tiago H. Falk and Jérémie Voix “In-Ear Microphone Speech Quality Enhancement via Adaptive Filtering and Artificial Bandwidth Extension” In J. Acoust. Soc. Am. 141.3, 2017, pp. 1321–1331 DOI: 10.1121/1.4976051
- Heming Wang, Xueliang Zhang and DeLiang Wang “Fusing Bone-Conduction and Air-Conduction Sensors for Complex-Domain Speech Enhancement” In IEEE/ACM Trans. on Audio, Speech, and Language Processing 30, 2022, pp. 3134–3143 DOI: 10.1109/TASLP.2022.3209943
- Mattes Ohlenbusch, Christian Rollwage and Simon Doclo “Training Strategies for Own Voice Reconstruction in Hearing Protection Devices Using an In-Ear Microphone” In Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), 2022 DOI: 10.1109/IWAENC53105.2022.9914801
- “Configurable EBEN: Extreme Bandwidth Extension Network to Enhance Body-Conducted Speech Capture” In IEEE/ACM Trans. on Audio, Speech, and Language Processing 31, 2023, pp. 3499–3512 DOI: 10.1109/TASLP.2023.3313433
- Mattes Ohlenbusch, Christian Rollwage and Simon Doclo “Multi-Microphone Noise Data Augmentation for DNN-based Own Voice Reconstruction for Hearables in Noisy Environments” In Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 416–420 DOI: 10.1109/ICASSP48485.2024.10447066
- “Librispeech: An ASR Corpus Based on Public Domain Audio Books” In Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206–5210 DOI: 10.1109/ICASSP.2015.7178964
- “A Study on Data Augmentation of Reverberant Speech for Robust Speech Recognition” In Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5220–5224 DOI: 10.1109/ICASSP.2017.7953152
- Weipeng He, Petr Motlicek and Jean-Marc Odobez “Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation” In IEEE/ACM Trans. on Audio, Speech, and Language Processing 29, 2021, pp. 1303–1317 DOI: 10.1109/TASLP.2021.3060257
- Prerak Srivastava, Antoine Deleforge and Emmanuel Vincent “Realistic Sources, Receivers and Walls Improve The Generalisability of Virtually-Supervised Blind Acoustic Parameter Estimators” In Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), 2022 DOI: 10.1109/IWAENC53105.2022.9914740
- “Conversion of Airborne to Bone-Conducted Speech with Deep Neural Networks” In Proc. Interspeech, 2021, pp. 1–5 DOI: 10.21437/Interspeech.2021-473
- “A One-Size-Fits-All Earpiece with Multiple Microphones and Drivers for Hearing Device Research” In Proc. AES International Conference on Headphone Technology, 2019
- Simon Haykin “Adaptive Filter Theory” Prentice Hall, 1996
- Mattes Ohlenbusch, Christian Rollwage and Simon Doclo “Speech-Dependent Modeling of Own Voice Transfer Characteristics for in-Ear Microphones in Hearables” In Proc. Convention of the European Acoustics Association (Forum Acusticum), 2023, pp. 1899–1902 DOI: 10.48550/arXiv.2309.08294
- Lennart Ljung “System Identification” In Signal Analysis and Prediction Springer, 1998, pp. 163–173
- “On Multiplicative Transfer Function Approximation in the Short-Time Fourier Transform Domain” In IEEE Signal Processing Letters 14.5, 2007, pp. 337–340 DOI: 10.1109/LSP.2006.888292
- Adrian P Simpson, Klaus J Kohler and Tobias Rettstadt “The Kiel Corpus of Read/Spontaneous Speech: Acoustic Data Base, Processing Tools, and Analysis Results” In Arbeitsberichte Institut Für Phonetik Und Digitale Sprachverarbeitung Universität Kiel 32 IPDS, 1997, pp. 243–247
- Andi Neustein “100 Sätze Reichen Für Ein Ganzes Leben (Blog-post)” In deutschlernerblog, 2019
- Mattes Ohlenbusch, Christian Rollwage and Simon Doclo “German own voice recordings with hearable microphones” Zenodo, 2024 DOI: 10.5281/zenodo.10844599
- “Distance Measures for Speech Processing” In IEEE Trans. on Acoustics, Speech, and Signal Processing 24.5, 1976, pp. 380–391 DOI: 10.1109/TASSP.1976.1162849
- Robert F. Kubichek “Mel-Cepstral Distance Measure for Objective Speech Quality Assessment” In Proc. IEEE Pacific Rim Conference on Communications Computers and Signal Processing, 1993, pp. 125–128 DOI: 10.1109/PACRIM.1993.407206
- International Telecommunications Union (ITU) “ITU-T P.862, Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs” In International Telecommunications Union, 2001
- J. Richard, V. Zimpfer and S. Roth “Comparison of Objective and Subjective Methods for Evaluating Speech Quality and Intelligibility Recorded through Bone Conduction and In-Ear Microphones” In Applied Acoustics 211, 2023 DOI: 10.1016/j.apacoust.2023.109576