Head Orientation Estimation with Distributed Microphones Using Speech Radiation Patterns (2312.01808v1)
Abstract: Determining the head orientation of a talker is not only beneficial for various speech signal processing applications, such as source localization or speech enhancement, but also facilitates intuitive voice control and interaction with smart environments or modern car assistants. Most approaches for head orientation estimation are based on visual cues. However, this requires camera systems which often are not available. We present an approach which purely uses audio signals captured with only a few distributed microphones around the talker. Specifically, we propose a novel method that directly incorporates measured or modeled speech radiation patterns to infer the talker's orientation during active speech periods based on a cosine similarity measure. Moreover, an automatic gain adjustment technique is proposed for uncalibrated, irregular microphone setups, such as ad-hoc sensor networks. In experiments with signals recorded in both anechoic and reverberant environments, the proposed method outperforms state-of-the-art approaches, using either measured or modeled speech radiation patterns.
- W. T. Chu and A. C. C. Warnock, “Detailed directivity of sound fields around human talkers,” Tech. Rep., 2002.
- S. D. Bellows, C. M. Pincock, J. K. Whiting, and T. W. Leishman, “Average speech directivity,” Tech. Rep. 1, 2019.
- T. W. Leishman, S. D. Bellows, C. M. Pincock, and J. K. Whiting, “High-resolution spherical directivity of live speech from a multiple-capture transfer function method,” J. Acoust. Soc. Am., vol. 149, no. 3, pp. 1507–1523, Mar. 2021.
- L. Liang and G. Yu, “Effect of speaker orientation on speech intelligibility in an automotive environment,” Appl. Acoust., vol. 205, p. 109269, Mar. 2023.
- S. Chakrabarty, D. Pilakeezhu, and E. A. P. Habets, “Head-orientation compensation with video-informed single channel speech enhancement,” in 2016 IEEE Int. Workshop Acoust. Signal Enhanc. (IWAENC), Sep. 2016.
- R. Al-Mafrachi, M. Gimm, and G. Schmidt, “Acoustic estimation of the head orientation for in-car communication systems,” in Fortschritte der Akustik – DAGA 2018, Mar. 2018, pp. 1780–1783.
- A. Brutti, M. Omologo, P. Svaizer, and C. Zieger, “Classification of acoustic maps to determine speaker position and orientation from a distributed microphone network,” in 2007 IEEE Int. Conf. Acoustics, Speech and Signal Process (ICASSP), Apr. 2007.
- M. Müller, S. v. d. Par, and J. Bitzer, “Head-orientation-based device selection: Are you talking to me?” in Speech Communication; 12. ITG Symp., 2016, pp. 1–5.
- Q. Yang and Y. Zheng, “Model-based head orientation estimation for smart devices,” Proc. ACM Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 5, no. 3, pp. 1–24, Sep. 2021.
- A. Sasou, “Acoustic head orientation estimation applied to powered wheelchair control,” in Proc. 2nd Int. Conf. Robotic Communication and Coordination, 2009.
- E. Murphy-Chutorian and M. Trivedi, “Head pose estimation in computer vision: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 4, pp. 607–626, Apr. 2009.
- A. Brutti, M. Omologo, and P. Svaizer, “Oriented global coherence field for the estimation of the head orientation in smart rooms equipped with distributed microphone arrays,” in Interspeech 2005 – Eurospeech, 9th Eur. Conf. Speech Commun. and Technology, Sep. 2005, pp. 2337–2340.
- P. Svaizer, A. Brutti, and M. Omologo, “Environment aware estimation of the orientation of acoustic sources using a line array,” in 2012 Proc. 20th Eur. Signal Process. Conf. (EUSIPCO), 2012, pp. 1024–1028.
- C. Segura and J. Hernando, “3d joint speaker position and orientation tracking with particle filters,” Sensors (Basel), vol. 14, no. 2, pp. 2259–2279, Jan. 2014.
- A. Abad, C. Segura, C. Nadeu, and J. Hernando, “Audio-based approaches to head orientation estimation in a smart-room,” in Interspeech 2007, Aug. 2007.
- C. Segura, A. Abad, J. Hernando, and C. Nadeu, “Speaker orientation estimation based on hybridation of GCC-PHAT and HLBR,” in Interspeech 2008, Sep. 2008.
- R. C. Felsheim, A. Brendel, P. A. Naylor, and W. Kellermann, “Head orientation estimation from multiple microphone arrays,” in 2020 28th Eur. Signal Process. Conf. (EUSIPCO), Jan. 2021, pp. 491–495.
- M. Barnard and W. Wang, “Audio head pose estimation using the direct to reverberant speech ratio,” Speech Commun., vol. 85, pp. 98–108, Dec. 2016.
- H. Nakajima, K. Kikuchi, T. Daigo, Y. Kaneda, K. Nakadai, and Y. Hasegawa, “Real-time sound source orientation estimation using a 96 channel microphone array,” in 2009 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Oct. 2009.
- A. Levi and H. Silverman, “A robust method to extract talker azimuth orientation using a large-aperture microphone array,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 2, pp. 277–285, Feb. 2010.
- A. Y. Nakano, S. Nakagawa, and K. Yamamoto, “Automatic estimation of position and orientation of an acoustic source by a microphone array network,” J. Acoust. Soc. Am., vol. 126, no. 6, pp. 3084–3094, Dec. 2009.
- R. Takashima, T. Takiguchi, and Y. Ariki, “Single-channel head orientation estimation based on discrimination of acoustic transfer function,” in Interspeech 2011, Aug. 2011.
- ——, “Estimation of talker’s head orientation based on discrimination of the shape of cross-power spectrum phase coefficients,” in Interspeech 2012, Sep. 2012.
- T. Halkosaari, M. Vaalgamaa, and M. Karjalainen, “Directivity of artificial and human speech,” J. Audio Eng. Soc., vol. 53, no. 7/8, pp. 620–631, Jul. 2005.
- S. D. Bellows and T. W. Leishman, “High-resolution analysis of the directivity factor and directivity index functions of human speech,” in Audio Eng. Soc. Conv., Mar. 2019, pp. 1–10.
- C. Pörschmann and J. M. Arend, “Analyzing the directivity patterns of human speakers,” in Fortschritte der Akustik – DAGA, Mar. 2020, pp. 1141–1144.
- D. Chalker and D. Mackerras, “Models for representing the acoustic radiation impedance of the mouth,” IEEE Trans. Acoust. Speech. Signal Process., vol. 33, no. 6, pp. 1606–1609, Dec. 1985.
- J. Huopaniemi, K. Kettunen, and J. Rahkonen, “Measurement and modeling techniques for directional sound radiation from the mouth,” in Proc. 1999 IEEE Workshop Applications of Signal Processing to Audio and Acoustics (WASPAA), 1999.
- G. Fischer, C. Schneiderwind, and A. Neidhardt, “Comparing the directivity of a mouth simulator and a simple physical model,” in 45th Annual Meeting on Acoustics (DAGA), Mar. 2019.
- E. G. Williams, “Chapter 6 - spherical waves,” in Fourier Acoustics, E. G. Williams, Ed. London: Academic Press, 1999, pp. 183–234.
- S. Graetzer, M. A. Akeroyd, J. Barker, T. J. Cox, J. F. Culling, G. Naylor, E. Porter, and R. Viveros-Muñoz, “Dataset of british english speech recordings for psychoacoustics and speech processing research: The clarity speech corpus,” Data in Brief, vol. 41, p. 107951, Apr. 2022.
- E. A. P. Habets and S. Gannot, “Generating sensor signals in isotropic noise fields,” J. Acoust. Soc. Am., vol. 122, no. 6, pp. 3464–3470, Dec. 2007.