Exploiting an External Microphone for Binaural RTF-Vector-Based Direction of Arrival Estimation for Multiple Speakers (2307.04460v1)
Abstract: In hearing aid applications, an important objective is to accurately estimate the direction of arrival (DOA) of multiple speakers in noisy and reverberant environments. Recently, we proposed a binaural DOA estimation method, where the DOAs of the speakers are estimated by selecting the directions for which the so-called Hermitian angle spectrum between the estimated relative transfer function (RTF) vector and a database of prototype anechoic RTF vectors is maximized. The RTF vector is estimated using the covariance whitening (CW) method, which requires a computationally complex generalized eigenvalue decomposition. The spatial spectrum is obtained by only considering frequencies where it is likely that one speaker dominates over the other speakers, noise and reverberation. In this contribution, we exploit the availability of an external microphone that is spatially separated from the hearing aid microphones and consider a low-complexity RTF vector estimation method that assumes a low spatial coherence between the undesired components in the external microphone and the hearing aid microphones. Using recordings of two speakers and diffuse-like babble noise in acoustic environments with mild reverberation and low signal-to-noise ratio, simulation results show that the proposed method yields a comparable DOA estimation performance as the CW method at a lower computational complexity.
- Y. Huang, J. Benesty, and J. Chen, “Time delay estimation and source localization,” in Springer Handbook of Speech Processing (J. Benesty, M. M. Sondhi, and Y. Huang, eds.), pp. 1043–1063, Berlin, Heidelberg, Germany: Springer, 2008.
- P.-A. Grumiaux, S. Kitic, L. Girin, and A. Guerin, “A survey of sound source localization with deep learning methods,” The Journal of the Acoustical Society of America, vol. 152, pp. 107–151, Jul. 2022.
- M. Farmani, M. S. Pedersen, Z.-H. Tan, and J. Jensen, “Bias-compensated informed sound source localization using relative transfer functions,” IEEE/ACM Trans. on Audio, Speech, and Language Processing, vol. 26, pp. 1275–1289, Jul. 2018.
- U. Kowalk, S. Doclo, and J. Bitzer, “Signal-informed DNN-based DOA estimation combining an external microphone and GCC-PHAT features,” in Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), (Bamberg, Germany), pp. 1–5, Sep. 2022.
- D. Fejgin and S. Doclo, “Comparison of binaural RTF-vector-based direction of arrival estimation methods exploiting an external microphone,” in Proc. European Signal Processing Conference (EUSIPCO), (Dublin, Ireland), pp. 241–245, Aug. 2021.
- D. Fejgin and S. Doclo, “Assisted RTF-vector-based binaural direction of arrival estimation exploiting a calibrated external microphone array,” in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), (Rhodes, Greece), Jun. 2023.
- J. Mecklenburger and T. Groth, “Wireless technologies and hearing aid connectivity,” in Hearing Aids (G. R. Popelka, B. C. J. Moore, R. R. Fay, and A. N. Popper, eds.), pp. 131–149, Cham, Switzerland: Springer, 2016.
- D. Fejgin and S. Doclo, “Coherence-based frequency subset selection for binaural RTF-vector-based direction of arrival estimation for multiple speakers,” in Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), (Bamberg, Germany), pp. 1–5, Sep. 2022.
- S. Markovich, S. Gannot, and I. Cohen, “Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals,” IEEE Trans. on Audio, Speech, and Language Processing, vol. 17, pp. 1071–1086, Aug. 2009.
- N. Gößling and S. Doclo, “Relative transfer function estimation exploiting spatially separated microphones in a diffuse noise field,” in Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), (Tokyo, Japan), pp. 146–150, Sep. 2018.
- O. Yilmaz and S. Rickard, “Blind separation of speech mixtures via time-frequency masking,” IEEE Trans. on Signal Processing, vol. 52, pp. 1830–1847, Jul. 2004.
- R. Varzandeh, M. Taseska, and E. A. P. Habets, “An iterative multichannel subspace-based covariance subtraction method for relative transfer function estimation,” in Proc. Joint Workshop on Hands-free Speech Communications and Microphone Arrays (HSCMA), (San Francisco, USA), pp. 11–15, Mar. 2017.
- R. O. Duda and W. L. Martens, “Range dependence of the response of a spherical head model,” Journal of the Acoustical Society of America, vol. 104, p. 3048–3058, Nov. 1998.
- M. Taseska and E. A. P. Habets, “DOA-informed source extraction in the presence of competing talkers and background noise,” EURASIP Journal on Advances in Signal Processing, vol. 2017, pp. 1–13, Aug. 2017.
- A. Brendel, C. Huang, and W. Kellermann, “STFT bin selection for localization algorithms based on the sparsity of speech signal spectra,” in Proc. Euronoise, (Crete, Greece), pp. 2561–2568, May 2018.
- C. Evers, E. A. P. Habets, S. Gannot, and P. A. Naylor, “DoA reliability for distributed acoustic tracking,” IEEE Signal Processing Letters, vol. 25, pp. 1320–1324, Sep. 2018.
- R. Lee, M.-S. Kang, B.-H. Kim, K.-H. Park, S. Q. Lee, and H.-M. Park, “Sound source localization based on GCC-PHAT with diffuseness mask in noisy and reverberant environments,” IEEE Access, vol. 8, pp. 7373–7382, Jan. 2020.
- A. Schwarz and W. Kellermann, “Coherent-to-diffuse power ratio estimation for dereverberation,” IEEE/ACM Trans. on Audio, Speech, and Language Processing, vol. 23, pp. 1006–1018, Jun. 2015.
- H. W. Löllmann, A. Brendel, and W. Kellermann, “Generalized coherence-based signal enhancement,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (Barcelona, Spain), pp. 201–205, May 2020.
- I. M. Lindevald and A. H. Benade, “Two-ear correlation in the statistical sound fields of rooms,” The Journal of the Acoustical Society of America, vol. 80, pp. 661–664, Aug. 1986.
- H. Kayser, S. D. Ewert, J. Anemüller, T. Rohdenburg, V. Hohmann, and B. Kollmeier, “Database of multichannel in-ear and behind-the-ear head-related and binaural room impulse responses,” EURASIP Journal on Advances in Signal Processing, vol. 2009, pp. 1–10, Jul. 2009.
- European Broadcasting Union, “Sound quality assessment material - recordings for subjective tests: User’s handbook for the EBU SQUAM CD,” 2008. [Online]. Available: https://tech.ebu.ch/publications/sqamcd.
- T. Gerkmann and R. C. Hendriks, “Unbiased MMSE-based noise power estimation with low complexity and low tracking delay,” IEEE Trans. on Audio, Speech, and Language Processing, vol. 20, pp. 1383–1393, May 2012.