Speech-dependent Data Augmentation for Own Voice Reconstruction with Hearable Microphones in Noisy Environments
Abstract: Own voice pickup for hearables in noisy environments benefits from using both an outer and an in-ear microphone outside and inside the occluded ear. Due to environmental noise recorded at both microphones, and amplification of the own voice at low frequencies and band-limitation at the in-ear microphone, an own voice reconstruction system is needed to enable communication. A large amount of own voice signals is required to train a supervised deep learning-based own voice reconstruction system. Training data can either be obtained by recording a large amount of own voice signals of different talkers with a specific device, which is costly, or through augmentation of available speech data. Own voice signals can be simulated by assuming a linear time-invariant relative transfer function between hearable microphones for each phoneme, referred to as own voice transfer characteristics. In this paper, we propose data augmentation techniques for training an own voice reconstruction system based on speech-dependent models of own voice transfer characteristics between hearable microphones. The proposed techniques use few recorded own voice signals to estimate transfer characteristics and can then be used to simulate a large amount of own voice signals based on single-channel speech signals. Experimental results show that the proposed speech-dependent individual data augmentation technique leads to better performance compared to other data augmentation techniques or compared to training only on the available recorded own voice signals, and additional fine-tuning on the available recorded signals can improve performance further.
- S. Nordholm, A. Davis, P. C. Yong, and H. H. Dam, “Assistive listening headsets for high noise environments: Protection and communication,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Apr. 2015, pp. 5753–5757.
- R. E. Bouserhal, A. Bernier, and J. Voix, “An in-ear speech database in varying conditions of the audio-phonation loop,” J. Acoust. Soc. Am., vol. 145, no. 2, pp. 1069–1077, Feb. 2019.
- S. Reinfeldt, P. Östli, B. Håkansson, and S. Stenfelt, “Hearing one’s own voice during phoneme vocalization - Transmission by air and bone conduction,” J. Acoust. Soc. Am., vol. 128, no. 2, pp. 751–762, Aug. 2010.
- H. Saint-Gaudens, H. Nélisse, F. Sgard, and O. Doutres, “Towards a practical methodology for assessment of the objective occlusion effect induced by earplugs,” J. Acoust. Soc. Am., vol. 151, no. 6, pp. 4086–4100, Jun. 2022.
- J. Richard, V. Zimpfer, and S. Roth, “Effect of bone conduction microphone location and mouth opening on transfer function between oral cavity sound pressure and skin acceleration,” in Proc. Convention of the European Acoustics Association (Forum Acusticum), Turin, Italy, Sep. 2023.
- C. Pörschmann, “Influences of bone conduction and air conduction on the sound of one’s own voice,” Acta Acustica united with Acustica, vol. 86, no. 6, pp. 1038–1045, Nov. 2000.
- S. Stenfelt and S. Reinfeldt, “A model of the occlusion effect with bone-conducted stimulation,” International Journal of Audiology, vol. 46, no. 10, pp. 595–608, Jan. 2007.
- S. Vogl and M. Blau, “Individualized prediction of the sound pressure at the eardrum for an earpiece with integrated receivers and microphones,” J. Acoust. Soc. Am., vol. 145, no. 2, pp. 917–930, Feb. 2019.
- K. Kondo, T. Fujita, and K. Nakagawa, “On equalization of bone conducted speech for improved speech quality,” in Proc. International Symposium on Signal Processing and Information Technology, Vancouver, BC, Canada, Aug. 2006, pp. 426–431.
- M. S. Rahman and T. Shimamura, “Intelligibility enhancement of bone conducted speech by an analysis-synthesis method,” in Proc. International Midwest Symposium on Circuits and Systems (MWSCAS), Seoul, Korea, Republic of, Aug. 2011.
- H. S. Shin, H.-G. Kang, and T. Fingscheidt, “Survey of speech enhancement supported by a bone conduction microphone,” in Proc. ITG Conference on Speech Communication, Braunschweig, Germany, Sep. 2012.
- H. S. Shin, T. Fingscheidt, and H.-G. Kang, “A priori SNR estimation using air- and bone-conduction microphones,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 23, no. 11, pp. 2015–2025, Nov. 2015.
- R. E. Bouserhal, T. H. Falk, and J. Voix, “In-ear microphone speech quality enhancement via adaptive filtering and artificial bandwidth extension,” J. Acoust. Soc. Am., vol. 141, no. 3, pp. 1321–1331, Mar. 2017.
- H.-P. Liu, Y. Tsao, and C.-S. Fuh, “Bone-conducted speech enhancement using deep denoising autoencoder,” Speech Communication, vol. 104, pp. 106–112, Nov. 2018.
- H. Park, Y.-S. Shin, and S.-H. Shin, “Speech quality enhancement for in-ear microphone based on neural network,” IEICE Trans. on Information and Systems, vol. 102, no. 8, pp. 1594–1597, 2019.
- C.-L. Liu, S.-W. Fu, Y.-J. Li, J.-W. Huang, H.-M. Wang, and Y. Tsao, “Multichannel speech enhancement by raw waveform-mapping using fully convolutional networks,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 28, pp. 1888–1900, 2020.
- C. Yu, K.-H. Hung, S.-S. Wang, Y. Tsao, and J.-W. Hung, “Time-domain multi-modal bone/air conducted speech enhancement,” IEEE Signal Process. Lett., vol. 27, pp. 1035–1039, 2020.
- M. Wang, J. Chen, X. Zhang, Z. Huang, and S. Rahardja, “Multi-modal speech enhancement with bone-conducted speech in time domain,” Applied Acoustics, vol. 200, no. 109058, Nov. 2022.
- H. Wang, X. Zhang, and D. Wang, “Attention-based fusion for bone-conducted and air-conducted speech enhancement in the complex domain,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, May 2022, pp. 7757–7761.
- Y. Li, Y. Wang, X. Liu, Y. Shi, S. Patel, and S.-F. Shih, “Enabling real-time on-chip audio super resolution for bone-conduction microphones,” Sensors, vol. 23, no. 1, Jan. 2023.
- M. Ohlenbusch, C. Rollwage, and S. Doclo, “Multi-microphone noise data augmentation for DNN-based own voice reconstruction for hearables in noisy environments,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, South Korea, Mar. 2024, pp. 416–420.
- C. Li, F. Yang, and J. Yang, “Restoration of bone-conducted speech with U-Net-like model and energy distance loss,” IEEE Signal Process. Lett., vol. 31, pp. 166–170, 2024.
- ——, “A two-stage approach to quality restoration of bone-conducted speech,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 32, pp. 818–829, 2024.
- H. Wang, X. Zhang, and D. Wang, “Fusing bone-conduction and air-conduction sensors for complex-domain speech enhancement,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 30, pp. 3134–3143, 2022.
- M. Ohlenbusch, C. Rollwage, and S. Doclo, “Training strategies for own voice reconstruction in hearing protection devices using an in-ear microphone,” in Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), Bamberg, Germany, Sep. 2022.
- L. He, H. Hou, S. Shi, X. Shuai, and Z. Yan, “Towards bone-conducted vibration speech enhancement on head-mounted wearables,” in Proc. Annual International Conference on Mobile Systems, Applications and Services, New York, USA, Jun. 2023, pp. 14–27.
- J. Hauret, T. Joubaud, V. Zimpfer, and É. Bavu, “Configurable EBEN: Extreme bandwidth extension network to enhance body-conducted speech capture,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 31, pp. 3499–3512, 2023.
- A. Edraki, W.-Y. Chan, J. Jensen, and D. Fogerty, “Speaker Adaptation For Enhancement Of Bone-Conducted Speech,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, South Korea, Apr. 2024, pp. 10 456–10 460.
- M. Ohlenbusch, C. Rollwage, and S. Doclo, “Modeling of speech-dependent own voice transfer characteristics for hearables with in-ear microphones,” arXiv:2310.06554, Oct. 2023.
- Y. Avargel and I. Cohen, “On multiplicative transfer function approximation in the short-time Fourier transform domain,” IEEE Signal Process. Lett., vol. 14, no. 5, pp. 337–340, May 2007.
- R. Ardila et al., “Common Voice: A massively-multilingual speech corpus,” in Proc. 12th Language Resources and Evaluation Conference, Marseille, France, May 2020, pp. 4218–4222.
- F. Denk et al., “A one-size-fits-all earpiece with multiple microphones and drivers for hearing device research,” in Proc. AES International Conference on Headphone Technology, San Francisco, USA, Aug. 2019.
- H. Dubey et al., “ICASSP 2023 Deep Noise Suppression Challenge,” IEEE Open Journal of Signal Processing, 2024.
- S. Braun and I. Tashev, “Data augmentation and loss normalization for deep noise suppression,” in Proc. International Conference on Speech and Computer (SPECOM), vol. 22, St. Petersburg, Russia, Oct. 2020, pp. 79–86.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. International Conference Learned Representations, San Diego, USA, 2015.
- Z.-Q. Wang, G. Wichern, S. Watanabe, and J. Le Roux, “STFT-domain neural speech enhancement with very low algorithmic latency,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 31, pp. 397–410, 2023.
- A. Paszke et al., “PyTorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems, vol. 32, 2019.
- International Telecommunications Union (ITU), “ITU-T P.862, Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs,” International Telecommunications Union, Feb. 2001.
- C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 19, no. 7, pp. 2125–2136, 2011.
- K. Tesch and T. Gerkmann, “Insights into deep non-linear filters for improved multi-channel speech enhancement,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 31, pp. 563–575, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.