Graph neural networks for sound source localization on distributed microphone networks
Abstract: Distributed Microphone Arrays (DMAs) present many challenges with respect to centralized microphone arrays. An important requirement of applications on these arrays is handling a variable number of input channels. We consider the use of Graph Neural Networks (GNNs) as a solution to this challenge. We present a localization method using the Relation Network GNN, which we show shares many similarities to classical signal processing algorithms for Sound Source Localization (SSL). We apply our method for the task of SSL and validate it experimentally using an unseen number of microphones. We test different feature extractors and show that our approach significantly outperforms classical baselines.
- A. Bertrand, “Applications and trends in wireless acoustic sensor networks: A signal processing perspective,” in IEEE Symp. on Comms. and Vehicular Tech. in the Benelux (SCVT), 2011.
- P. Tzirakis et al., “Multi-Channel Speech Enhancement using Graph Neural Networks,” Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), Feb. 2021.
- M. Cobos et al., “A Survey of Sound Source Localization Methods in Wireless Acoustic Sensor Networks,” Wireless Communications and Mobile Computing, 2017.
- S. Adavanne et al., “Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), 2018, pp. 1462–1466.
- S. Chakrabarty et al., “Broadband doa estimation using convolutional neural networks trained with noise signals,” in Proc. IEEE Workshop on Appl. of Signal Process. to Audio and Acoust. (WASPAA), Oct. 2017, pp. 136–140.
- P. Aarabi, “The Fusion of Distributed Microphone Arrays for Sound Localization,” EURASIP J. on Advances in Signal Process., vol. 2003, no. 4, pp. 1–10, Dec. 2003.
- J. Zhou et al., “Graph neural networks: A review of methods and applications,” AI Open, vol. 1, pp. 57–81, Jan. 2020.
- P. W. Battaglia et al., “Relational inductive biases, deep learning, and graph networks,” Google Inc., Mountain View, CA, USA, Tech. Rep., Oct. 2018.
- A. Santoro et al., “A simple neural network module for relational reasoning,” in Proc. Neural Inform. Process. Conf, vol. 30, 2017.
- H. C. So, “Source Localization: Algorithms and Analysis,” in Handbook of Position Location. John Wiley & Sons, Ltd, 2011, ch. 2, pp. 25–66.
- F. Gustafsson et al., “Positioning using time-difference of arrival measurements,” in Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), 2003.
- P. Pertilä et al., “Measurement Combination for Acoustic Source Localization in a Room Environment,” EURASIP J. on Audio, Speech, and Music Process., vol. 2008, Jul. 2008.
- D. Li et al., “Energy Based Collaborative Source Localization Using Acoustic Micro-Sensor Array,” EURASIP J. on Applied Signal Process., 2003.
- C. Knapp et al., “The generalized correlation method for estimation of time delay,” IEEE Trans. Acoust., Speech, Signal Process., vol. 24, no. 4, pp. 320–327, Aug. 1976.
- N. Furnon et al., “Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes,” in Proc. Eur. Signal Process. Conf. (EUSIPCO), Aug. 2021.
- D. Wang et al., “Neural Speech Separation Using Spatially Distributed Microphones,” in Proc. Conf. of Int. Speech Commun. Assoc. (INTERSPEECH), Apr. 2020.
- Y. Luo et al., “End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation,” in Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), May 2020, pp. 6394–6398.
- Y.-K. Luo et al., “Evaluating railway noise sources using distributed microphone array and graph neural networks,” Transportation Research Part D: Transport and Environment, vol. 107, Jun. 2022.
- T. N. Kipf et al., “Semi-Supervised Classification with Graph Convolutional Networks,” in Proc. Int. Conf. on Learning Representations, Feb. 2017.
- D. P. Kingma et al., “Adam: A Method for Stochastic Optimization,” arXiv, Jan. 2017.
- J. B. Allen et al., “Image method for efficiently simulating small-room acoustics,” J. Acoust. Soc. Am., vol. 65, no. 4, pp. 943–950, Apr. 1979.
- R. Scheibler et al., “Pyroomacoustics: A python package for audio room simulation and array processing algorithms,” in Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), 2018, pp. 351–355.
- R. Neubauer et al., “Prediction of the reverberation time in rectangular rooms with non-uniformly distributed sound absorption,” Archives of Acoustics, vol. 26, no. 3, 2001.
- J. Yamagishi et al., “CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92),” Nov. 2019.
- W. He et al., “Deep Neural Networks for Multiple Speaker Detection and Localization,” in Proc. Int. Conf. Robotics and Automation, May 2018, pp. 74–79.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.