The Neural-SRP method for positional sound source localization (2403.09455v1)
Abstract: Steered Response Power (SRP) is a widely used method for the task of sound source localization using microphone arrays, showing satisfactory localization performance on many practical scenarios. However, its performance is diminished under highly reverberant environments. Although Deep Neural Networks (DNNs) have been previously proposed to overcome this limitation, most are trained for a specific number of microphones with fixed spatial coordinates. This restricts their practical application on scenarios frequently observed in wireless acoustic sensor networks, where each application has an ad-hoc microphone topology. We propose Neural-SRP, a DNN which combines the flexibility of SRP with the performance gains of DNNs. We train our network using simulated data and transfer learning, and evaluate our approach on recorded and simulated data. Results verify that Neural-SRP's localization performance significantly outperforms the baselines.
- Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network. In Proc. Eur. Signal Process. Conf. (EUSIPCO), pages 1462–1466, 2018.
- Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am., 65(4):943–950, April 1979.
- Multimodal fusion for multimedia analysis: A survey. Multimedia Systems, 16(6):345–379, November 2010.
- Alexander Bertrand. Applications and trends in wireless acoustic sensor networks: A signal processing perspective. In Proc. IEEE Symp. on Comms. and Vehicular Tech. in the Benelux (SCVT), pages 1–6, November 2011.
- Microphone Arrays: Signal Processing Techniques and Applications. Springer Science & Business Media, 2001.
- Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy. Proc. Detect. and Classific. of Acoust. Scenes and Events (DCASE), pages 30–34, 2019.
- Soumitro Chakrabarty and Emanuel A. P. Habets. Broadband doa estimation using convolutional neural networks trained with noise signals. In Proc. IEEE Workshop on Appl. of Signal Process. to Audio and Acoust. (WASPAA), pages 136–140, October 2017.
- A Double-cross-correlation Processor for Blind Sampling Rate Offset Estimation in Acoustic Sensor Networks. In Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), pages 641–645, May 2019.
- Convolutional recurrent neural networks for music classification. In Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), pages 2392–2396, March 2017.
- Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In Proc. Neural Inform. Process. Conf, 2014.
- A Modified SRP-PHAT Functional for Robust Real-Time Sound Source Localization With Scalable Spatial Sampling. IEEE Signal Processing Letters, 18(1):71–74, January 2011.
- Joseph Hector DiBiase. A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays. PhD thesis, Brown University, 2000.
- The LOCATA Challenge: Acoustic Source Localization and Tracking. IEEE Trans. Audio, Speech, Language Process., 28:1620–1643, 2020.
- William Falcon and The PyTorch Lightning team. PyTorch lightning.
- Mean absorption estimation from room impulse responses using virtually supervised learning. J. Acoust. Soc. Am., 150(2):1286–1299, August 2021.
- Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes. In Proc. Eur. Signal Process. Conf. (EUSIPCO), June 2021.
- Deep Complex-Valued Convolutional-Recurrent Networks for Single Source DOA Estimation. In Proc. Int. Workshop on Acoust. Signal Enhancement (IWAENC), pages 1–5, September 2022.
- Improved feature extraction for CRNN-based multiple sound source localization. In Proc. Eur. Signal Process. Conf. (EUSIPCO), pages 231–235, August 2021.
- A Survey of Sound Source Localization with Deep Learning Methods. J. Acoust. Soc. Am., 152(1):pp107–151, September 2021.
- Libri-adhoc40: A dataset collected from synchronized ad-hoc microphone arrays. In Proc. Asia-Pacific Signal and Inform. Process. Assoc. Annual Summit and Conf. (APSIPA), 2021.
- F. Gustafsson and F. Gunnarsson. Positioning using time-difference of arrival measurements. In Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), 2003.
- Deep Neural Networks for Multiple Speaker Detection and Localization. In Proc. Int. Conf. Robotics and Automation, pages 74–79, May 2018.
- Adam: A Method for Stochastic Optimization. In arXiv: 1412.6980, January 2017.
- Semi-Supervised Classification with Graph Convolutional Networks. In Proc. Int. Conf. on Learning Representations, February 2017.
- C. Knapp and G. Carter. The generalized correlation method for estimation of time delay. IEEE Trans. Acoust., Speech, Signal Process., 24(4):320–327, August 1976.
- Librispeech: An ASR corpus based on public domain audio books. In Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), pages 5206–5210, April 2015.
- PyTorch: An imperative style, high-performance deep learning library. In Proc. Neural Inform. Process. Conf, volume 32. Curran Associates, Inc., 2019.
- Regression Versus Classification for Neural Network Based Audio Source Localization. In Proc. IEEE Workshop on Appl. of Signal Process. to Audio and Acoust. (WASPAA), pages 343–347, October 2019.
- Pyroomacoustics: A python package for audio room simulation and array processing algorithms. In Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), pages 351–355, 2018.
- A Survey on Deep Transfer Learning. In Proc. Int. Conf. on Artificial Neural Networks (ICANN), pages 270–279, Cham, 2018. Springer International Publishing.
- Multi-Channel Speech Enhancement using Graph Neural Networks. Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), February 2021.
- CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92), November 2019.
- Why does PHAT work well in lownoise, reverberative environments? In Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), pages 2565–2568, March 2008.
- A Real-Time Speaker Diarization System Based on Spatial Spectrum. In Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), pages 7208–7212, June 2021.