Blind Identification of Binaural Room Impulse Responses from Smart Glasses (2403.19217v2)
Abstract: Smart glasses are increasingly recognized as a key medium for augmented reality, offering a hands-free platform with integrated microphones and non-ear-occluding loudspeakers to seamlessly mix virtual sound sources into the real-world acoustic scene. To convincingly integrate virtual sound sources, the room acoustic rendering of the virtual sources must match the real-world acoustics. Information about a user's acoustic environment however is typically not available. This work uses a microphone array in a pair of smart glasses to blindly identify binaural room impulse responses (BRIRs) from a few seconds of speech in the real-world environment. The proposed method uses dereverberation and beamforming to generate a pseudo reference signal that is used by a multichannel Wiener filter to estimate room impulse responses which are then converted to BRIRs. The multichannel room impulse responses can be used to estimate room acoustic parameters which is shown to outperform baseline algorithms in the estimation of reverberation time and direct-to-reverberant energy ratio. Results from a listening experiment further indicate that the estimated BRIRs often reproduce the real-world room acoustics perceptually more convincingly than measured BRIRs from other rooms of similar size.
- A. Neidhardt, C. Schneiderwind, and F. Klein, “Perceptual Matching of Room Acoustics for Auditory Augmented Reality in Small Rooms - Literature Review and Theoretical Framework,” Trends in Hearing, vol. 26, p. 1–22, 2022.
- S. V. Amengual Gari, P. W. Robinson, and P. T. Calamia, “Room acoustic characterization for binaural rendering: From spatial room impulse responses to deep learning,” in Proc. International Congress on Acoustics, 2022, p. 1–10.
- H. Møller, M. F. Sørensen, D. Hammershøi, and C. B. Jensen, “Head-related transfer functions of human subjects,” J. Audio Eng. Soc., vol. 43, no. 5, p. 300–321, 1995.
- S. V. Amengual Garí, J. M. Arend, P. T. Calamia, and P. W. Robinson, “Optimizations of the spatial decomposition method for binaural reproduction,” J. Audio Eng. Soc., vol. 68, no. 12, pp. 959–976, 2020.
- T. Deppisch, H. Helmholz, and J. Ahrens, “End-to-End Magnitude Least Squares Binaural Rendering of Spherical Microphone Array Signals,” in Int. Conf. on Immersive and 3D Audio, 2021, pp. 1–8.
- T. Deppisch, J. Ahrens, S. V. Amengual Garí, and P. Calamia, “Blind Estimation of Spatial Room Impulse Responses Using a Pseudo Reference Signal,” in Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), 2024, p. 1–5.
- N. Meyer-Kahlen and S. J. Schlecht, “Blind Directional Room Impulse Response Parameterization from Relative Transfer Functions,” in IEEE Int. Workshop on Acoustic Signal Enhancement, 2022, p. 1–5.
- G. Xu, H. Liu, L. Tong, and T. Kailath, “A Least-Squares Approach to Blind Channel Identification,” IEEE Transactions on Signal Processing, vol. 43, no. 12, pp. 2982–2993, 1995.
- Y. Huang and J. Benesty, “A class of frequency-domain adaptive approaches to blind multichannel identification,” IEEE Transactions on Signal Processing, vol. 51, no. 1, pp. 11–24, 2003.
- M. A. Haque and M. K. Hasan, “Noise Robust Multichannel Frequency-Domain LMS Algorithms for Blind Channel Identification,” IEEE Signal Processing Letters, vol. 15, p. 305–308, 2008.
- B. Jo and P. Calamia, “Robust blind multichannel identification based on a phase constraint and different lp-norm constraints,” in 28th European Signal Processing Conference, 2021, pp. 1966–1970.
- A. Perez-Lopez, A. Politis, and E. Gomez, “Blind reverberation time estimation from ambisonic recordings,” in IEEE 22nd International Workshop on Multimedia Signal Processing, 2020, pp. 1–6.
- C. J. Steinmetz, V. K. Ithapu, and P. Calamia, “Filtered Noise Shaping for Time Domain Room Impulse Response Estimation from Reverberant Speech,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021, pp. 221–225.
- K. Lee, J. Seo, K. Choi, S. Lee, and B. S. Chon, “Room Impulse Response Estimation in a Multiple Source Environment,” in AES Int. Conf. on Spatial and Immersive Audio, 2023.
- Z. Liao, F. Xiong, J. Luo, M. Cai, E. S. Chng, J. Feng, and X. Zhong, “Blind Estimation of Room Impulse Response from Monaural Reverberant Speech with Segmental Generative Neural Network,” in INTERSPEECH, 2023, pp. 2723–2727.
- A. Ratnarajah, I. Ananthabhotla, V. K. Ithapu, P. Hoffmann, D. Manocha, and P. Calamia, “Towards Improved Room Impulse Response Estimation for Speech Recognition,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2023, pp. 1–5.
- S. Lee, H. S. Choi, and K. Lee, “Yet Another Generative Model for Room Impulse Response Estimation,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023, pp. 1–5.
- J. Eaton, N. D. Gaubitch, A. H. Moore, and P. A. Naylor, “Estimation of Room Acoustic Parameters: The ACE Challenge,” IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 24, no. 10, pp. 1681–1693, 2016.
- H. Gamper and I. J. Tashev, “Blind reverberation time estimation using a convolutional neural network,” IEEE Int. Workshop on Acoustic Signal Enhancement, pp. 136–140, 2018.
- P. Götz, C. Tuna, A. Walther, and E. A. P. Habets, “Online reverberation time and clarity estimation in dynamic acoustic conditions,” The Journal of the Acoustical Society of America, vol. 153, no. 6, pp. 3532–3542, 2023.
- A. Lindau and S. Weinzierl, “Assessing the plausibility of virtual acoustic environments,” Acta Acustica united with Acustica, vol. 98, no. 5, pp. 804–810, 2012.
- S. A. Wirler, N. Meyer-Kahlen, and S. J. Schlecht, “Towards transfer-plausibility for evaluating mixed reality audio in complex scenes,” in AES Int. Conf. on Audio for Virtual and Augmented Reality, 2020, p. 1–10.
- T. Yoshioka and T. Nakatani, “Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening,” IEEE Transactions on Audio, Speech and Language Processing, vol. 20, no. 10, pp. 2707–2720, 2012.
- V. W. Neo, C. Evers, and P. A. Naylor, “Speech dereverberation performance of a polynomial-EVD subspace approach,” in 28th European Signal Processing Conference, 2021, pp. 221–225.
- T. Yoshioka and T. Nakatani, “Dereverberation for reverberation-robust microphone arrays,” in 21st European Signal Processing Conference, 2013, pp. 1–5.
- S. Braun and E. A. Habets, “Online Dereverberation for Dynamic Scenarios Using a Kalman Filter with an Autoregressive Model,” IEEE Signal Processing Letters, vol. 23, no. 12, pp. 1741–1745, 2016.
- R. O. Schmidt, “Multiple emitter location and parameter estimation,” IEEE Transactions on Antennas and Propagation, vol. 34, no. 3, pp. 276–280, 1986.
- J.-M. Jot, “An analysis/synthesis approach to real-time artificial reverberation,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 1992, p. 221–224.
- J. Traer and J. H. McDermott, “Statistics of natural reverberation enable perceptual separation of sound and space,” Proc. of the National Academy of Sciences, vol. 113, no. 48, pp. E7856–E7865, 2016.
- V. Välimäki, B. Holm-Rasmussen, B. Alary, and H. M. Lehtonen, “Late reverberation synthesis using filtered velvet noise,” Applied Sciences, vol. 7, no. 5, p. 1–17, 2017.
- C. Pörschmann, P. Stade, and J. M. Arend, “Binauralization of Omnidirectional Room Impulse Responses - Algorithm and Technical Evaluation,” in 20th Int. Conf. on Digital Audio Effects, 2017, pp. 345–352.
- J. M. Arend, S. V. Amengual Garí, C. Schissler, F. Klein, and P. W. Robinson, “Six-Degrees-of-Freedom Parametric Spatial Audio Based on One Monaural Room Impulse Response,” J. Audio Eng. Soc., vol. 69, no. 7/8, pp. 557–575, 2021.
- E. A. P. Habets, I. Cohen, and S. Gannot, “Generating nonstationary multisensor signals under a spatial coherence constraint,” The Journal of the Acoustical Society of America, vol. 124, no. 5, pp. 2911–2917, 2008.
- M. Zaunschirm, C. Schörkhuber, and R. Höldrich, “Binaural rendering of Ambisonic signals by head-related impulse response time alignment and a diffuseness constraint,” The Journal of the Acoustical Society of America, vol. 143, no. 6, pp. 3616–3627, 2018.
- C. Schörkhuber, M. Zaunschirm, and R. Höldrich, “Binaural Rendering of Ambisonic Signals via Magnitude Least Squares,” in Proc. of the German Annual Conference on Acoustics (DAGA), 2018, pp. 339–342.
- R. Duraiswami, D. N. Zotkin, Z. Li, E. Grassi, N. A. Gumerov, and L. S. Davis, “High Order Spatial Audio Capture and its Binaural Head-Tracked Playback over Headphones with HRTF Cues,” in Proc. 119th Conv. Audio Eng. Soc., 2005, p. 1–16.
- I. Ifergan and B. Rafaely, “On the selection of the number of beamformers in beamforming-based binaural reproduction,” Eurasip Journal on Audio, Speech, and Music Processing, vol. 2022, no. 6, p. 1–17, 2022.
- L. Madmoni, J. Donley, V. Tourbabin, and B. Rafaely, “Beamforming-based Binaural Reproduction by Matching of Binaural Signals,” in AES Int. Conf. on Audio for Virtual and Augmented Reality, 2020, p. 1–8.
- ——, “Binaural Reproduction from Microphone Array Signals Incorporating Head-Tracking,” in Immersive and 3D Audio: From Architecture to Automotive, 2021, pp. 1–5.
- T. Lübeck, S. V. Amengual Garí, P. Calamia, D. L. Alon, J. Crukley, and Z. Ben-Hur, “Perceptual evaluation of approaches for binaural reproduction of non-spherical microphone array signals,” Frontiers in Signal Processing, vol. 2, no. August, pp. 1–18, 2022.
- L. McCormack, N. Meyer-Kahlen, D. L. Alon, Z. Ben-Hur, S. V. Amengual Gari, and P. Robinson, “Six-Degrees-of-Freedom Binaural Reproduction of Head-Worn Microphone Array Capture,” J. Audio Eng. Soc., vol. 71, no. 10, p. 638–649, 2023.
- Y. Avargel, S. Member, and I. Cohen, “On Multiplicative Transfer Function Approximation in the Short-Time Fourier Transform Domain,” IEEE Signal Processing Letters, vol. 14, no. 5, pp. 337–340, 2007.
- T. d. M. Prego, A. A. de Lima, S. L. Netto, B. Lee, A. Said, R. W. Schafer, and T. Kalker, “A blind algorithm for reverberation-time estimation using subband decomposition of speech signals,” The Journal of the Acoustical Society of America, vol. 131, no. 4, pp. 2811–2816, 2012.
- Y. Hioka and K. Niwa, “PSD estimation in Beamspace for Estimating Direct-to-Reverberant Ratio from A Reverberant Speech Signal,” in Proc. ACE Challenge Workshop, 2015.
- M. Berzborn, R. Bomhardt, J. Klein, J.-G. Richter, and M. Vorländer, “The ITA-Toolbox: An Open Source MATLAB Toolbox for Acoustic Measurements and Signal Processing,” Proc. of the German Annual Conference on Acoustics (DAGA), pp. 222–225, 2017.
- G. Götz, R. Falcón Pérez, S. J. Schlecht, and V. Pulkki, “Neural network for multi-exponential sound energy decay analysis,” The Journal of the Acoustical Society of America, vol. 152, no. 2, pp. 942–953, 2022.
- E. Larsen, N. Iyer, C. R. Lansing, and A. S. Feng, “On the minimum audible difference in direct-to-reverberant energy ratio,” The Journal of the Acoustical Society of America, vol. 124, no. 1, pp. 450–461, 2008.
- T. McKenzie, N. Meyer-Kahlen, and S. J. Schlecht, “The role of source signal similarity in distinguishing between different positions in a room,” in AES Int. Conf. on Spatial and Immersive Audio, 2023, p. 1–9.
- D. Fantini, G. Presti, M. Geronazzo, R. Bona, A. G. Privitera, and F. Avanzini, “Co-immersion in Audio Augmented Virtuality: the Case Study of a Static and Approximated Late Reverberation Algorithm,” IEEE Trans. Visual. Comput. Graphics, vol. 29, no. 11, pp. 4472–4481, 2023.
- F. Klein, A. Neidhardt, and M. Seipel, “Real-time Estimation of Reverberation Time for Selection of suitable binaural room impulse responses,” in 5th Int. Conf. on Spatial Audio, 2019, pp. 145–150.
- H. Helmholz, I. Ananthabhotla, P. T. Calamia, and S. V. Amengual Garí, “Towards the Prediction of Perceived Room Acoustical Similarity,” in AES Int. Conf. on Audio for Virtual and Augmented Reality, 2022, pp. 1–11.