Multi-Microphone Noise Data Augmentation for DNN-based Own Voice Reconstruction for Hearables in Noisy Environments (2312.08908v1)
Abstract: Hearables with integrated microphones may offer communication benefits in noisy working environments, e.g. by transmitting the recorded own voice of the user. Systems aiming at reconstructing the clean and full-bandwidth own voice from noisy microphone recordings are often based on supervised learning. Recording a sufficient amount of noise required for training such a system is costly since noise transmission between outer and inner microphones varies individually. Previously proposed methods either do not consider noise, only consider noise at outer microphones or assume inner and outer microphone noise to be independent during training, and it is not yet clear whether individualized noise can benefit the training of and own voice reconstruction system. In this paper, we investigate several noise data augmentation techniques based on measured transfer functions to simulate multi-microphone noise. Using augmented noise, we train a multi-channel own voice reconstruction system. Experiments using real noise are carried out to investigate the generalization capability. Results show that incorporating augmented noise yields large benefits, in particular considering individualized noise augmentation leads to higher performance.
- “Assistive listening headsets for high noise environments: Protection and communication” In Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5753–5757 DOI: 10.1109/ICASSP.2015.7179074
- Heming Wang, Xueliang Zhang and DeLiang Wang “Fusing Bone-Conduction and Air-Conduction Sensors for Complex-Domain Speech Enhancement” In IEEE/ACM Trans. on Audio, Speech, and Language Processing 30, 2022, pp. 3134–3143 DOI: 10.1109/TASLP.2022.3209943
- “EBEN: Extreme bandwidth extension network applied to speech signals captured with noise-resilient body-conduction microphones” In Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023 DOI: 10.1109/ICASSP49357.2023.10096301
- Hung-Ping Liu, Yu Tsao and Chiou-Shann Fuh “Bone-conducted speech enhancement using deep denoising autoencoder” In Speech Communication 104, 2018, pp. 106–112 DOI: 10.1016/j.specom.2018.06.002
- “Time-Domain Multi-Modal Bone/Air Conducted Speech Enhancement” In IEEE Signal Processing Letters 27, 2020, pp. 1035–1039 DOI: 10.1109/LSP.2020.3000968
- “Deep Multi-Frame MVDR Filtering for Binaural Noise Reduction” In Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), 2022 DOI: 10.1109/IWAENC53105.2022.9914742
- Nils L Westhausen and Bernd T Meyer “Low bit rate binaural link for improved ultra low-latency low-complexity multichannel speech enhancement in Hearing Aids” In arXiv, 2023 DOI: 10.48550/arXiv.2307.08858
- Mattes Ohlenbusch, Christian Rollwage and Simon Doclo “Training Strategies for Own Voice Reconstruction in Hearing Protection Devices Using An In-Ear Microphone” In Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), 2022 DOI: 10.1109/IWAENC53105.2022.9914801
- “Dictionary-Based Fusion of Contact and Acoustic Microphones for Wind Noise Reduction” In Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), 2022 DOI: 10.1109/IWAENC53105.2022.9914710
- “Multi-modal speech enhancement with bone-conducted speech in time domain” In Applied Acoustics 200, 2022, pp. 109058 DOI: 10.1016/j.apacoust.2022.109058
- “Enabling Real-Time On-Chip Audio Super Resolution for Bone-Conduction Microphones” In Sensors 23.1, 2023, pp. 35 DOI: 10.3390/s23010035
- “The Hearpiece database of individual transfer functions of an in-the-ear earpiece for hearing device research” In Acta Acustica 5, 2021 DOI: 10.1051/aacus/2020028
- “Direction-of-arrival dependency of active noise cancellation headphones” In ASME 2018 Noise Control and Acoustics Division Session presented at INTERNOISE, 2018 DOI: 10.1115/NCAD2018-6120
- Rachel E. Bouserhal, Antoine Bernier and Jérémie Voix “An in-ear speech database in varying conditions of the audio-phonation loop” In J. Acoust. Soc. Am. 145.2, 2019, pp. 1069–1077 DOI: 10.1121/1.5091777
- “Deep Speech Enhancement Challenge at ICASSP 2023” In arXiv, 2023 DOI: 10.48550/arXiv.2303.11510
- “A one-size-fits-all earpiece with multiple microphones and drivers for hearing device research” In Proc. AES International Conference on Headphone Technology, 2019, pp. 1–9
- “Data augmentation and loss normalization for deep noise suppression” In Int. Conf. on Speech and Computer (SPECOM) 22, 2020, pp. 79–86 DOI: 10.1007/978-3-030-60276-5_8
- “STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency” In IEEE/ACM Trans. on Audio, Speech, and Language Processing 31, 2023, pp. 397–410 DOI: 10.1109/TASLP.2022.3224285
- Diederik P Kingma and Jimmy Ba “Adam: A method for stochastic optimization” In Proc. Int. Conf. Learn. Representations, 2015 DOI: 10.48550/arXiv.1412.6980
- International Telecommunications Union (ITU) “ITU-T P.862, Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs” Geneva, Switzerland In International Telecommunications Union, 2001
- “An algorithm for intelligibility prediction of time–frequency weighted noisy speech” In IEEE Trans. on Audio, Speech, and Language Processing 19.7, 2011, pp. 2125–2136 DOI: 10.1109/TASL.2011.2114881
- “Insights Into Deep Non-Linear Filters for Improved Multi-Channel Speech Enhancement” In IEEE/ACM Trans. on Audio, Speech, and Language Processing 31, 2023, pp. 563–575 DOI: 10.1109/TASLP.2022.3221046