BERP: A Blind Estimator of Room Acoustic and Physical Parameters for Single-Channel Noisy Speech Signals (2405.04476v5)
Abstract: Room acoustic parameters (RAPs) and room physical parameters (RPPs) are essential metrics for parameterizing the room acoustical characteristics (RACs) of a sound field around a listener's local environment, offering comprehensive indications for various applications. Current RAP and RPP estimation methods either fall short of covering broad real-world acoustic environments in the context of real background noise or lack universal frameworks for blindly estimating RAPs and RPPs from noisy single-channel speech signals, particularly sound source distances, direction of arrival (DOA) of sound sources, and occupancy levels. On the other hand, in this paper, we propose a new universal blind estimation framework called the blind estimator of the room acoustical and physical parameters (BERP), by introducing a new stochastic room impulse response (RIR) model, namely the sparse stochastic impulse response (SSIR) model, and endowing the BERP with a unified encoder and multiple separate predictors to estimate the RPPs and the parameters SSIR in parallel. This estimation framework enables computationally efficient and universal estimation of room parameters using only noisy single-channel speech signals. Finally, all RAPs can be simultaneously derived from RIRs synthesized from the SSIR model with estimated parameters. To evaluate the effectiveness of the proposed BERP and SSIR models, we compile a task-specific dataset from several publicly available datasets. The results reveal that the BERP achieves state-of-the-art (SOTA) performance. In addition, the evaluation results for the SSIR RIR model also demonstrated its efficacy. The code is available on GitHub.
- M. Barron, Auditorium Acoustics and Architectural Design (2nd ed.). London: Routledge, 2009.
- A. Tsilfidis, I. Mporas, J. Mourjopoulos, and N. Fakotakis, “Automatic speech recognition performance in different room acoustic environments with and without dereverberation preprocessing,” Computer Speech & Language, vol. 27, no. 1, pp. 380–395, 2013. Special issue on Paralinguistics in Naturalistic Speech and Language.
- T. Jenrungrot, V. Jayaram, S. Seitz, and I. Kemelmacher-Shlizerman, “The cone of silence: Speech separation by localization,” in Advances in Neural Information Processing Systems, 2020.
- S. E. Chazan, H. Hammer, G. Hazan, J. Goldberger, and S. Gannot, “Multi-microphone speaker separation based on deep doa estimation,” 2019 27th European Signal Processing Conference (EUSIPCO), pp. 1–5, 2019.
- J.-M. Jot and K. S. Lee, “Augmented reality headphone environment rendering,” in Audio Engineering Society Conference: 2016 AES International Conference on Audio for Virtual and Augmented Reality, Sep 2016.
- J. van der Werff and D. de Leeuw, “What you specify is what you get (part 1),” in Audio Engineering Society Convention 114, Mar 2003.
- S. V. Amengual Garí, W. Lachenmayr, and E. Mommertz, “Spatial analysis and auralization of room acoustics using a tetrahedral microphone,” The Journal of the Acoustical Society of America, vol. 141, pp. EL369–EL374, 04 2017.
- I.-J. Jung and J.-G. Ih, “Distance estimation of a sound source using the multiple intensity vectors,” The Journal of the Acoustical Society of America, vol. 148, pp. EL105–EL111, 07 2020.
- C. Chen, U. Jain, C. Schissler, S. V. A. Gari, Z. Al-Halah, V. K. Ithapu, P. Robinson, and K. Grauman, “Soundspaces: Audio-visual navigation in 3d environments,” in Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI, (Berlin, Heidelberg), p. 17–36, Springer-Verlag, 2020.
- H.-Y. Lee, J.-W. Cho, M. Kim, and H.-M. Park, “Dnn-based feature enhancement using doa-constrained ica for robust speech recognition,” IEEE Signal Processing Letters, vol. 23, no. 8, pp. 1091–1095, 2016.
- A. Xenaki, J. Bünsow Boldt, and M. Græsbøll Christensen, “Sound source localization and speech enhancement with sparse Bayesian learning beamforming,” The Journal of the Acoustical Society of America, vol. 143, pp. 3912–3921, 06 2018.
- IEC 60268-16:2020, Sound system equipment - part 16: Objective rating of speech intelligibility by speech transmission index. 2020.
- V. M. A. Peutz and W. Kelin, “Articulation loss of consonants influenced by noise,” Reverberation and Echo,” (in Dutch), vol. 28, pp. 11–18, Acoust. Soc. Netherlands.
- ISO 3382:2009, Acoustics - measurements of room acoustics parameters - part 1: Performance spaces. 2009.
- K. Kinoshita, M. Delcroix, T. Yoshioka, T. Nakatani, E. Habets, R. Haeb-Umbach, V. Leutnant, A. Sehr, W. Kellermann, R. Maas, S. Gannot, and B. Raj, “The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech,” in 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 1–4, 2013.
- L. Frenkel, S. E. Chazan, and J. Goldberger, “Domain adaptation using suitable pseudo labels for speech enhancement and dereverberation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1226–1236, 2024.
- H. Morgenstern and B. Rafaely, “Spatial reverberation and dereverberation using an acoustic multiple-input multiple-output system,” Journal of the Audio Engineering Society, vol. 65, p. 42–55, Feb. 2017.
- Signals and Communication Technology, Springer London, 2010.
- T. Gajecki and W. Nogueira, “A fused deep denoising sound coding strategy for bilateral cochlear implants,” IEEE Transactions on Biomedical Engineering, pp. 1–11, 2024.
- E. P. Reynders, J. Van den Wyngaert, M. Verlinden, and G. Vermeir, “Development and performance assessment of sound absorbing chandeliers for reverberation control and improved verbal communication in large rooms,” Applied Acoustics, vol. 218, p. 109874, 2024.
- S. D. Loreto, M. Cantarini, S. Squartini, V. Lori, F. Serpilli, and C. D. Perna, “Assessment of speech intelligibility in scholar classrooms by measurements and prediction methods,” Building Acoustics, vol. 30, no. 2, pp. 165–202, 2023.
- D. Fogerty, A. Alghamdi, and W.-Y. Chan, “The effect of simulated room acoustic parameters on the intelligibility and perceived reverberation of monosyllabic words and sentences,” The Journal of the Acoustical Society of America, vol. 147, pp. EL396–EL402, 05 2020.
- B. Eurich, T. Klenzner, and M. Oehler, “Impact of room acoustic parameters on speech and music perception among participants with cochlear implants,” Hearing Research, vol. 377, pp. 122–132, 2019.
- G. Yenduri, R. M, P. K. R. Maddikunta, T. R. Gadekallu, R. H. Jhaveri, A. Bandi, J. Chen, W. Wang, A. A. Shirawalmath, R. Ravishankar, and W. Wang, “Spatial computing: Concept, applications, challenges and future directions,” 2024.
- A. Somayazulu, C. Chen, and K. Grauman, “Self-supervised visual acoustic matching,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- A. Politis, J. Vilkamo, and V. Pulkki, “Sector-based parametric sound field reproduction in the spherical harmonic domain,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 5, pp. 852–866, 2015.
- H. M. Kamdjou, D. Baudry, V. Havard, and S. Ouchani, “Resource-constrained extended reality operated with digital twin in industrial internet of things,” IEEE Open Journal of the Communications Society, vol. 5, pp. 928–950, 2024.
- J. Nikunen and T. Virtanen, “Direction of arrival based spatial covariance model for blind sound source separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 3, pp. 727–739, 2014.
- A. Taghipour, S. Athari, A. Gisladottir, T. Sievers, and K. Eggenschwiler, “Room acoustical parameters as predictors of acoustic comfort in outdoor spaces of housing complexes,” Frontiers in Psychology, vol. 11, p. 344, 03 2020.
- H. Dong and C. Lee, “Speech intelligibility improvement in noisy reverberant environments based on speech enhancement and inverse filtering,” J AUDIO SPEECH MUSIC PROC., vol. 3, 2018.
- X. Li, L. Girin, F. Badeig, and R. Horaud, “Reverberant sound localization with a robot head based on direct-path relative transfer function,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), p. 2819–2826, IEEE Press, 2016.
- H. Kuttruff, Room Acoustics. Taylor & Francis, 2016.
- J. E. Summers, “Auralization: Fundamentals of Acoustics, Modelling, Simulation, Algorithms, and Acoustic Virtual Reality,” The Journal of the Acoustical Society of America, vol. 123, pp. 4028–4029, 06 2008.
- L. Wang, S. Duangpummet, and M. Unoki, “Blind estimation of speech transmission index and room acoustic parameters by using extended model of room impulse response derived from speech signals,” IEEE Access, vol. 11, pp. 49431–49444, 2023.
- S. S. Kushwaha, I. R. Roman, M. Fuentes, and J. P. Bello, “Sound source distance estimation in diverse and dynamic acoustic conditions,” in 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 1–5, 2023.
- P.-A. Grumiaux, S. Kitić, L. Girin, and A. Guérin, “A survey of sound source localization with deep learning methods,” The Journal of the Acoustical Society of America, vol. 152, pp. 107–151, 07 2022.
- C. Molnar and T. Freiesleben, Supervised Machine Learning For Science. 2024.
- C. Ick, A. Mehrabi, and W. Jin, “Blind acoustic room parameter estimation using phase features,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, 2023.
- A. F. Genovese, H. Gamper, V. Pulkki, N. Raghuvanshi, and I. J. Tashev, “Blind room volume estimation from single-channel noisy speech,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 231–235, 2019.
- H. Gamper and I. J. Tashev, “Blind reverberation time estimation using a convolutional neural network,” in 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 136–140, 2018.
- P. S. López, P. Callens, and M. Cernak, “A universal deep room acoustics estimator,” in 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 356–360, 2021.
- P. Callens and M. Cernak, “Joint blind room acoustic characterization from speech and music signals using convolutional recurrent neural networks,” 2020.
- J. Eaton, N. Gaubitch, A. Moore, and P. Naylor, “Estimation of room acoustic acparameters: The ace challenge,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, pp. 1–1, 06 2016.
- K. Zheng, C. Zheng, J. Sang, Y. Zhang, and X. Li, “Noise-robust blind reverberation time estimation using noise-aware time–frequency masking,” Measurement, vol. 192, p. 110901, 2022.
- S. Duangpummet, J. Karnjana, W. Kongprawechnon, and M. Unoki, “Blind estimation of speech transmission index and room acoustic parameters based on the extended model of room impulse response,” Applied Acoustics, vol. 185, p. 108372, 2022.
- J. B. Allen and D. A. Berkley, “Image method for efficiently simulating small‐room acoustics,” The Journal of the Acoustical Society of America, vol. 65, pp. 943–950, 04 1979.
- J. Traer and J. H. McDermott, “Statistics of natural reverberation enable perceptual separation of sound and space,” Proceedings of the National Academy of Sciences, vol. 113, no. 48, pp. E7856–E7865, 2016.
- C. Christensen, G. Koutsouris, and J. Rindel, “The iso 3382 parameters: Can we simulate them? can we measure them?,” vol. 20, 06 2013.
- R. Kliper, H. Kayser, D. Weinshall, I. Nelken, and J. Anemüller, “Monaural azimuth localization using spectral dynamics of speech,” in Proc. Interspeech 2011, pp. 33–36, 2011.
- R. Takashima, T. Takiguchi, and Y. Ariki, “Single-channel multi-talker-localization based on maximum likelihood,” in 2009 IEEE/SP 15th Workshop on Statistical Signal Processing, pp. 461–464, 2009.
- F. Toole, Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms. Audio Engineering Society Presents, Taylor & Francis, 2017.
- S. Cerdá, A. Giménez, J. Romero, R. Cibrián, and J. Miralles, “Room acoustical parameters: A factor analysis approach,” Applied Acoustics, vol. 70, no. 1, pp. 97–109, 2009.
- M. Queiroz, F. Iazzetta, F. Kon, M. H. A. Gomes, F. L. Figueiredo, B. Masiero, L. K. Ueda, L. Dias, M. H. C. Torres, and L. F. Thomaz, “Acmus: An open, integrated platform for room acoustics research - journal of the brazilian computer society,” 2013.
- T. Houtgast and H. J. M. Steeneken, “The modulation transfer function in room acoustics as a predictor of speech intelligibility,” The Journal of the Acoustical Society of America, vol. 54, no. 2, pp. 557–557, 1973.
- H. J. M. Steeneken and T. Houtgast, “A physical method for measuring speech‐transmission quality,” The Journal of the Acoustical Society of America, vol. 67, no. 1, pp. 318–326, 1980.
- T. Houtgast and H. J. M. Steeneken, “A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria,” The Journal of the Acoustical Society of America, vol. 77, pp. 1069–1077, 03 1985.
- M. R. Schroeder, “New method of measuring reverberation time,” The Journal of the Acoustical Society of America, vol. 37, no. 3, pp. 409–412, 1965.
- O. Shih and A. Rowe, “Occupancy estimation using ultrasonic chirps,” in Proceedings of the ACM/IEEE Sixth International Conference on Cyber-Physical Systems, ICCPS ’15, (New York, NY, USA), p. 149–158, Association for Computing Machinery, 2015.
- H. Qian, G. Zhenhao, and L. Chao, “Occupancy estimation in smart buildings using audio-processing techniques,” in International Conference on Computing in Civil and Building Engineering (ICCCBE) 2016, 2016 Fall.
- A. Ebadat, G. Bottegal, D. Varagnolo, B. Wahlberg, H. Hjalmarsson, and K. H. Johansson, “Blind identification strategies for room occupancy estimation,” in 2015 European Control Conference (ECC), pp. 1315–1320, 2015.
- Y. Sun, L. Dong, B. Patra, S. Ma, S. Huang, A. Benhaim, V. Chaudhary, X. Song, and F. Wei, “A length-extrapolatable transformer,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (A. Rogers, J. Boyd-Graber, and N. Okazaki, eds.), (Toronto, Canada), pp. 14590–14604, Association for Computational Linguistics, 2023.
- J. Su, M. Ahmed, Y. Lu, S. Pan, W. Bo, and Y. Liu, “Roformer: Enhanced transformer with rotary position embedding,” Neurocomputing, vol. 568, p. 127063, 2024.
- M. West, “The sound attenuation in an open-plan office,” Applied Acoustics, vol. 6, no. 1, pp. 35–56, 1973.
- P. Somervuo, P. Lauha, and T. Lokki, “Effects of landscape and distance in automatic audio based bird species identification,” The Journal of the Acoustical Society of America, vol. 154, pp. 245–254, 07 2023.
- R. Badeau, “Unified stochastic reverberation modeling,” in 2018 26th European Signal Processing Conference (EUSIPCO), pp. 2175–2179, 2018.
- M. R. Schroeder and K. H. Kuttruff, “On Frequency Response Curves in Rooms. Comparison of Experimental, Theoretical, and Monte Carlo Results for the Average Frequency Spacing between Maxima,” The Journal of the Acoustical Society of America, vol. 34, pp. 76–80, 01 1962.
- M. R. Schroeder, “Frequency‐Correlation Functions of Frequency Responses in Rooms,” The Journal of the Acoustical Society of America, vol. 34, pp. 1819–1823, 12 1962.
- R. Badeau, “Common mathematical framework for stochastic reverberation models,” The Journal of the Acoustical Society of America, vol. 145, pp. 2733–2745, 04 2019.
- J.-D. Polack, “Playing billiards in the concert hall: The mathematical foundations of geometrical room acoustics,” Applied Acoustics, vol. 38, no. 2, pp. 235–244, 1993.
- M. R. Schroeder, “Modulation transfer functions: Definition and measurement,” Acta Acustica united with Acustica, vol. 49, no. 3, pp. 179–182, 1981.
- K. Prawda, S. J. Schlecht, and V. Välimäki, “Calibrating the Sabine and Eyring formulas,” The Journal of the Acoustical Society of America, vol. 152, pp. 1158–1169, 08 2022.
- G. Götz, S. J. Schlecht, and V. Pulkki, “A dataset of higher-order ambisonic room impulse responses and 3d models measured in a room with varying furniture,” in 2021 Immersive and 3D Audio: from Architecture to Automotive (I3DA), pp. 1–8, 2021.
- I. Szöke, M. Skácel, L. Mošner, J. Paliesek, and J. Černocký, “Building and evaluation of a real room impulse response dataset,” IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 4, pp. 863–876, 2019.
- D. T. Murphy and S. Shelley, “Openair: An interactive auralization web resource and database,” in Audio Engineering Society Convention 129, Nov 2010.
- J. Thiemann, N. Ito, and E. Vincent, “The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings,” Proceedings of Meetings on Acoustics, vol. 19, p. 035081, 05 2013.
- V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: An asr corpus based on public domain audio books,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210, 2015.
- B. R. Glasberg and B. C. Moore, “Derivation of auditory filter shapes from notched-noise data,” Hearing Research, vol. 47, no. 1, pp. 103–138, 1990.
- R. Patterson, K. Robinson, J. Holdsworth, D. McKeown, C. Zhang, and M. Allerhand, “Complex sounds and auditory images,” in Auditory Physiology and Perception (Y. CAZALS, K. HORNER, and L. DEMANY, eds.), pp. 429–446, Pergamon, 1992.
- P. Srivastava, A. Deleforge, and E. Vincent, “Realistic sources, receivers and walls improve the generalisability of virtually-supervised blind acoustic parameter estimators,” in 2022 International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 1–5, 2022.
- D. Ellis, “Gammatone-like spectrograms,” accessed Oct. 2023.
- P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for activation functions,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings, OpenReview.net, 2018.
- Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, “Language modeling with gated convolutional networks,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, p. 933–941, JMLR.org, 2017.
- A. Łańcucki, “Fastpitch: Parallel text-to-speech with pitch prediction,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6588–6592, 2021.
- Red Hook, NY, USA: Curran Associates Inc., 2019.
- D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” 2023.
- P. J. Huber, “A robust version of the probability ratio test,” Annals of Mathematical Statistics, vol. 36, pp. 1753–1758, 1965.
- L. Ciampiconi, A. Elwood, M. Leonardi, A. Mohamed, and A. Rozza, “A survey and taxonomy of loss functions in machine learning,” 2023.
- L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, “On the variance of the adaptive learning rate and beyond,” 2021.