Low-power SNN-based audio source localisation using a Hilbert Transform spike encoding scheme (2402.11748v3)
Abstract: Sound source localisation is used in many consumer devices, to isolate audio from individual speakers and reject noise. Localization is frequently accomplished by ``beamforming'', which combines phase-shifted audio streams to increase power from chosen source directions, under a known microphone array geometry. Dense band-pass filters are often needed to obtain narrowband signal components from wideband audio. These approaches achieve high accuracy, but narrowband beamforming is computationally demanding, and not ideal for low-power IoT devices. We demonstrate a novel method for sound source localisation on arbitrary microphone arrays, designed for efficient implementation in ultra-low-power spiking neural networks (SNNs). We use a Hilbert transform to avoid dense band-pass filters, and introduce a new event-based encoding method that captures the phase of the complex analytic signal. Our approach achieves state-of-the-art accuracy for SNN methods, comparable with traditional non-SNN super-resolution beamforming. We deploy our method to low-power SNN inference hardware, with much lower power consumption than super-resolution methods. We demonstrate that signal processing approaches co-designed with spiking neural network implementations can achieve much improved power efficiency. Our new Hilbert-transform-based method for beamforming can also improve the efficiency of traditional DSP-based signal processing.
- S. Haghighatshoar and G. Caire, “Low-complexity massive mimo subspace estimation and tracking from low-dimensional projections,” IEEE Transactions on Signal Processing, vol. 66, no. 7, pp. 1832–1844, 2018.
- T. Li, L. Fan, M. Zhao, Y. Liu, and D. Katabi, “Making the invisible visible: Action recognition through walls and occlusions,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 872–881.
- R. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE transactions on antennas and propagation, vol. 34, no. 3, pp. 276–280, 1986.
- R. Roy and T. Kailath, “Esprit-estimation of signal parameters via rotational invariance techniques,” IEEE Transactions on acoustics, speech, and signal processing, vol. 37, no. 7, pp. 984–995, 1989.
- S. Haykin and Z. Chen, “The cocktail party problem,” Neural computation, vol. 17, no. 9, pp. 1875–1902, 2005.
- J. H. McDermott, “The cocktail party problem,” Current Biology, vol. 19, no. 22, pp. R1024–R1027, 2009.
- J. Nam, A. Adhikary, J.-Y. Ahn, and G. Caire, “Joint spatial division and multiplexing: Opportunistic beamforming, user grouping and simplified downlink scheduling,” IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 5, pp. 876–890, 2014.
- B. D. Van Veen and K. M. Buckley, “Beamforming: A versatile approach to spatial filtering,” IEEE assp magazine, vol. 5, no. 2, pp. 4–24, 1988.
- S. P. Thompson, “Li. on the function of the two ears in the perception of space,” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, vol. 13, no. 83, pp. 406–416, 1882.
- J. W. Strutt, “On our perception of sound direction,” Philosophical Magazine, vol. 13, no. 74, pp. 214–32, 1907.
- T. Yin and J. Chan, “Interaural time sensitivity in medial superior olive of cat,” Journal of neurophysiology, vol. 64, no. 2, pp. 465–488, 1990.
- J. A. Wall, L. J. McDaid, L. P. Maguire, and T. M. McGinnity, “Spiking neural network model of sound localization using the interaural intensity difference,” IEEE transactions on neural networks and learning systems, vol. 23, no. 4, pp. 574–586, 2012.
- G. Tanoni, “A spiking neural network based approach for binaural sound localization,” 2019.
- T. Schoepe, D. Gutierrez-Galan, J. P. Dominguez-Morales, H. Greatorex, A. F. Jiménez Fernández, A. Linares-Barranco, and E. Chicca, “Closed-loop sound source localization in neuromorphic systems,” Neuromorphic Computing and Engineering, 2023.
- W. Maass, “Networks of spiking neurons: the third generation of neural network models,” Neural networks, vol. 10, no. 9, pp. 1659–1671, 1997.
- K. Roy, A. Jaiswal, and P. Panda, “Towards spike-based machine intelligence with neuromorphic computing,” Nature, vol. 575, no. 7784, pp. 607–617, 2019.
- P. Panda, S. A. Aketi, and K. Roy, “Toward scalable, efficient, and accurate deep spiking neural networks with backward residual connections, stochastic softmax, and hybridization,” Frontiers in Neuroscience, vol. 14, p. 653, 2020.
- Y. Cao, Y. Chen, and D. Khosla, “Spiking deep convolutional neural networks for energy-efficient object recognition,” International Journal of Computer Vision, vol. 113, pp. 54–66, 2015.
- D. Dold, J. Soler Garrido, V. Caceres Chian, M. Hildebrandt, and T. Runkler, “Neuro-symbolic computing with spiking neural networks,” in Proceedings of the International Conference on Neuromorphic Systems 2022, 2022, pp. 1–4.
- P. U. Diehl and M. Cook, “Unsupervised learning of digit recognition using spike-timing-dependent plasticity,” Frontiers in computational neuroscience, vol. 9, p. 99, 2015.
- F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G.-J. Nam et al., “Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip,” IEEE transactions on computer-aided design of integrated circuits and systems, vol. 34, no. 10, pp. 1537–1557, 2015.
- M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain et al., “Loihi: A neuromorphic manycore processor with on-chip learning,” Ieee Micro, vol. 38, no. 1, pp. 82–99, 2018.
- T. Moraitis, A. Sebastian, and E. Eleftheriou, “Optimality of short-term synaptic plasticity in modelling certain dynamic environments,” arXiv preprint arXiv:2009.06808, 2020.
- Z. Pan, M. Zhang, J. Wu, J. Wang, and H. Li, “Multi-tone phase coding of interaural time difference for sound source localization with spiking neural networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2656–2670, 2021.
- D. Muir, F. Bauer, and P. Weidel, “Rockpool documentaton,” Mar 2019.
- S.-N. Tang and Y.-H. Chen, “Area-efficient fft kernel with improved use of gi for multistandard mimo-ofdm applications,” Applied Sciences, vol. 9, no. 14, p. 2877, 2019.
- J. Hazarika, S. R. Ahamed, and H. B. Nemade, “Low-complexity, energy-efficient fully parallel split-radix fft architecture,” Electronics Letters, vol. 58, no. 18, pp. 678–680, 2022.
- F. Hu, X. Song, R. He, and Y. Yu, “Sound source localization based on residual network and channel attention module,” Scientific Reports, vol. 13, no. 1, p. 5443, 2023.
- W. Gerstner, “Time structure of the activity in neural network models,” Phys. Rev. E, vol. 51, pp. 738–758, Jan 1995. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevE.51.738
- J. H. Lee, S. Haghighatshoar, and A. Karbasi, “Exact gradient computation for spiking neural networks via forward propagation,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2023, pp. 1812–1831.