On Speech Pre-emphasis as a Simple and Inexpensive Method to Boost Speech Enhancement (2401.09315v1)
Abstract: Pre-emphasis filtering, compensating for the natural energy decay of speech at higher frequencies, has been considered as a common pre-processing step in a number of speech processing tasks over the years. In this work, we demonstrate, for the first time, that pre-emphasis filtering may also be used as a simple and computationally-inexpensive way to leverage deep neural network-based speech enhancement performance. Particularly, we look into pre-emphasizing the estimated and actual clean speech prior to loss calculation so that different speech frequency components better mirror their perceptual importance during the training phase. Experimental results on a noisy version of the TIMIT dataset show that integrating the pre-emphasis-based methodology at hand yields relative estimated speech quality improvements of up to 4.6% and 3.4% for noise types seen and unseen, respectively, during the training phase. Similar to the case of pre-emphasis being considered as a default pre-processing step in classical automatic speech recognition and speech coding systems, the pre-emphasis-based methodology analyzed in this article may potentially become a default add-on for modern speech enhancement.
- “Estimating the spectral tilt of the glottal source from telephone speech using a deep neural network,” The Journal of the Acoustical Society of America, vol. 141, 2017.
- Speech coding: with code-excited linear prediction, Springer, 2017.
- The Acoustic Analysis of Speech, Singular/Thomson Learning, 2002.
- “Analysis of high-frequency energy in long-term average spectra of singing, speech, and voiceless fricatives,” The Journal of the Acoustical Society of America, vol. 132, pp. 1754–1764, 2012.
- “A new evidence model for missing data speech recognition with applications in reverberant multi-source environments,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, pp. 372–384, 2011.
- “On the deficiency of intelligibility metrics as proxies for subjective intelligibility,” Speech Communication, vol. 150, pp. 9–22, 2023.
- “SDR - Half-baked or well done?,” in Proceedings of ICASSP 2019 – 44th IEEE International Conference on Acoustics, Speech and Signal Processing, May 12-17, Brighton, UK, 2019, pp. 626–630.
- “Speech Enhancement via Attention Masking Network (SEAMNET): An End-to-End System for Joint Suppression of Noise and Reverberation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 515–526, 2020.
- Hynek Hermansky, “Perceptual linear predictive (PLP) analysis of speech,” Journal of the Acoustical Society of America, vol. 87, pp. 1738–1752, 1990.
- Juan Manuel Martín Doñas, Online multichannel speech enhancement combining statistical signal processing and deep neural networks, Ph.D. thesis, University of Granada, 2020.
- “Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs,” in Proceedings of ICASSP 2001 – 26th IEEE International Conference on Acoustics, Speech and Signal Processing, May 7-11, Salt Lake City, USA, 2001, pp. 749–752.
- ITU-T, “Mapping function for transforming P.862 raw result scores to MOS-LQO,” Recommendation P.862.1, International Telecommunication Union, Geneva, Nov. 2003.
- “Smoothing along frequency in online neural network supported acoustic beamforming,” in Proceedings of Speech Communication; 13th ITG-Symposium, October 10-12, Oldenburg, Germany, 2018, pp. 131–135.
- “Modeling speech structure to improve T-F masks for speech enhancement and recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2705–2715, 2022.
- “Perceive and predict: Self-supervised speech representation based loss functions for speech enhancement,” in Proceedings of ICASSP 2023 – 48th IEEE International Conference on Acoustics, Speech and Signal Processing, June 4-10, Rhodes island, Greece, 2023.
- “Adam: A method for stochastic optimization,” in Proceedings of ICLR 2015 – 3rd International Conference on Learning Representations, May 7-9, San Diego, USA, 2015.
- Neil Gershenfeld, “An experimentalist’s introduction to the observation of dynamical systems,” in Directions in Chaos — Volume 2, pp. 310–353. World Scientific, 1988.
- “Revising perceptual linear prediction (PLP),” in Proceedings of INTERSPEECH 2005 – 9th European Conference on Speech Communication and Technology, September 4-8, Lisbon, Portugal, 2005.
- S. S. Stevens, “On the psychophysical law,” Psychological Review, vol. 64, pp. 153–181, 1957.
- “Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT},” Tech. Rep. 4930, National Institute of Standards and Technology, 1993.
- “Speech database development: Design and analysis of the acoustic-phonetic corpus,” in Proceedings of Speech Input/Output Assessment and Speech Databases, September 20-23, Noordwijkerhout, The Netherlands, 1989, pp. 2161–2170.
- “An algorithm for intelligibility prediction of time-frequency weighted noisy speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, pp. 2125–2136, 2011.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.