Spiking Two-Stream Methods with Unsupervised STDP-based Learning for Action Recognition (2306.13783v1)
Abstract: Video analysis is a computer vision task that is useful for many applications like surveillance, human-machine interaction, and autonomous vehicles. Deep Convolutional Neural Networks (CNNs) are currently the state-of-the-art methods for video analysis. However they have high computational costs, and need a large amount of labeled data for training. In this paper, we use Convolutional Spiking Neural Networks (CSNNs) trained with the unsupervised Spike Timing-Dependent Plasticity (STDP) learning rule for action classification. These networks represent the information using asynchronous low-energy spikes. This allows the network to be more energy efficient and neuromorphic hardware-friendly. However, the behaviour of CSNNs is not studied enough with spatio-temporal computer vision models. Therefore, we explore transposing two-stream neural networks into the spiking domain. Implementing this model with unsupervised STDP-based CSNNs allows us to further study the performance of these networks with video analysis. In this work, we show that two-stream CSNNs can successfully extract spatio-temporal information from videos despite using limited training data, and that the spiking spatial and temporal streams are complementary. We also show that using a spatio-temporal stream within a spiking STDP-based two-stream architecture leads to information redundancy and does not improve the performance.
- Imagenet Classification with Deep Convolutional Neural Networks, in: Advances in Neural Information Processing Systems (NIPS), volume 25, 2012, p. 84–90. URL: https://doi.org/10.1145/3065386. doi:10.1145/3065386.
- Overfeat: Integrated Recognition, Localization and Detection using Convolutional Networks, in: International Conference on Learning Representations (ICLR), 2014, pp. 1–16.
- Large-scale Video Classification with Convolutional Neural Networks, in: Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1725–1732. doi:10.1109/CVPR.2014.223.
- K. Simonyan, A. Zisserman, Two-stream Convolutional Networks for Action Recognition in Videos, in: Neural Information Processing Systems (NIPS), volume 1, 2014, p. 568–576.
- Deep Learning in Spiking Neural Networks , Neural Networks 111 (2019) 47–63. URL: https://doi.org/10.1016%2Fj.neunet.2018.12.002. doi:10.1016/j.neunet.2018.12.002.
- Multi-layered Spiking Neural Network with Target Timestamp Threshold Adaptation and STDP, in: International Joint Conference on Neural Networks (IJCNN), 2019, pp. 1–8. doi:10.1109/IJCNN.2019.8852346.
- Reducing latency in a converted spiking video segmentation network, in: International Symposium on Circuits and Systems (ISCAS), 2021, pp. 1–5. doi:10.1109/ISCAS51556.2021.9401667.
- Spatio-temporal backpropagation for training high-performance spiking neural networks, Frontiers in Neuroscience 12 (2018). URL: http://dx.doi.org/10.3389/fnins.2018.00331. doi:10.3389/fnins.2018.00331.
- Unsupervised Visual Feature Learning with Spike-timing-dependent Plasticity: How Far are We From Traditional Feature Learning Approaches?, Pattern Recognition 93 (2019) 418–429. URL: http://dx.doi.org/10.1016/j.patcog.2019.04.016. doi:10.1016/j.patcog.2019.04.016.
- Recognizing Human Actions: A Local SVM Approach, in: International Conference on Pattern Recognition (ICPR’04), volume 3, 2004, p. 32–36.
- Actions as Space-Time Shapes, in: Tenth IEEE International Conference on Computer Vision (ICCV’05), volume 2, 2005, pp. 1395–1402. doi:10.1109/ICCV.2005.28.
- Free viewpoint Action Recognition Using Motion History Volumes, Computer Vision and Image Understanding 104 (2006) 249–257. URL: https://hal.inria.fr/inria-00544629. doi:10.1016/j.cviu.2006.07.013.
- Action mach: a spatio-temporal maximum average correlation height filter for action recognition, in: International Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8. doi:10.1109/CVPR.2008.4587727.
- S. N. Gowda, Human activity recognition using combinatorial deep belief networks, in: International Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 1589–1594. doi:10.1109/CVPRW.2017.203.
- On human motion prediction using recurrent neural networks, in: International Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4674–4683. doi:10.1109/CVPR.2017.497.
- View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition From Skeleton Data, in: International Conference on Computer Vision (ICCV), 2017, pp. 2136–2145. doi:10.1109/ICCV.2017.233.
- A New Hybrid Deep Learning Model for Human Action Recognition, Journal of King Saud University - Computer and Information Sciences 32 (2020) 447–453. doi:https://doi.org/10.1016/j.jksuci.2019.09.004.
- View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data, in: 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, Venice, 2017, pp. 2136–2145. URL: http://ieeexplore.ieee.org/document/8237495/. doi:10.1109/ICCV.2017.233.
- M. Edel, E. Köppe, Binarized-BLSTM-RNN Based Human Activity Recognition, in: 2016 International Conference on Indoor Positioning and Indoor Navigation (IPIN), 2016, pp. 1–7. doi:10.1109/IPIN.2016.7743581.
- Deep Learning for Sensor-Based Activity Recognition: A Survey, Pattern Recognition Letters 119 (2019) 3–11. URL: https://doi.org/10.1016%2Fj.patrec.2018.02.010. doi:10.1016/j.patrec.2018.02.010.
- Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?, CoRR abs/1711.09577 (2017). URL: http://arxiv.org/abs/1711.09577. arXiv:1711.09577.
- Slowfast networks for video recognition, in: International Conference on Computer Vision (ICCV), 2019, pp. 6201–6210. URL: https://doi.ieeecomputersociety.org/10.1109/ICCV.2019.00630. doi:10.1109/ICCV.2019.00630.
- C. Feichtenhofer, X3D: expanding architectures for efficient video recognition, CoRR abs/2004.04730 (2020). URL: https://arxiv.org/abs/2004.04730. arXiv:2004.04730.
- Learning spatiotemporal features with 3D convolutional networks, in: International Conference on Computer Vision (ICCV), 2015, pp. 4489–4497. URL: http://arxiv.org/abs/1412.0767.
- H. Wang, L. Wang, Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3633–3642. doi:10.1109/CVPR.2017.387.
- Convolutional Two-Stream Network Fusion for Video Action Recognition, in: International Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 1933–1941. URL: https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.213. doi:10.1109/CVPR.2016.213.
- Evaluating Two-Stream CNN for Video Classification, in: International Conference on Multimedia Retrieval, 2015, pp. 435–442. URL: https://doi.org/10.1145%2F2671188.2749406. doi:10.1145/2671188.2749406.
- Event-based vision: A survey, IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (2022) 154–180. URL: http://dx.doi.org/10.1109/TPAMI.2020.3008413. doi:10.1109/tpami.2020.3008413.
- SpikeMS: Deep spiking neural network for motion segmentation, in: International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 3414–3420. URL: https://arxiv.org/abs/2105.06562.
- A New Spiking Convolutional Recurrent Neural Network (SCRNN) With Applications to Event-Based Hand Gesture Recognition, Frontiers in Neuroscience 14 (2020). doi:10.3389/fnins.2020.590164.
- 2D versus 3D Convolutional Spiking Neural Networks Trained with Unsupervised STDP for Human Action Recognition, in: International Joint Conference on Neural Networks (IJCNN) 2022, 2022, pp. 1–8. URL: https://hal.archives-ouvertes.fr/hal-03679597.
- High-Accuracy and Energy-Efficient Action Recognition with Deep Spiking Neural Network, in: Neural Information Processing (ICONIP), 2022, pp. 279–292.
- NeuSpike-Net: High Speed Video Reconstruction via Bio-Inspired Neuromorphic Cameras, in: International Conference on Computer Vision (ICCV), 2021, pp. 2400–2409.
- On-Off Center-Surround Receptive Fields for Accurate and Robust Image Classification, in: International Conference on Machine Learning (ICML), 2021, pp. 1–21. URL: https://arxiv.org/abs/2106.07091.
- Neural Coding in Spiking Neural Networks: A Comparative Study for Robust Neuromorphic Systems, Frontiers in Neuroscience 15 (2021). doi:10.3389/fnins.2021.638474.
- A. Burkitt, A review of the integrate-and-fire neuron model: I. homogeneous synaptic input, Biological cybernetics 95 (2006) 1–19. URL: http://link.springer.com/10.1007/s00422-006-0068-6. doi:10.1007/s00422-006-0068-6.
- A survey of neuromorphic computing and neural networks in hardware, CoRR abs/1705.06963 (2017). URL: http://arxiv.org/abs/1705.06963. arXiv:1705.06963.
- Simulation of a Memristor-Based Spiking Neural Network Immune to Device Variations, in: International Joint Conference on Neural Networks (IJCNN), 2011, pp. 1775–1781. doi:10.1109/IJCNN.2011.6033439.
- P. Diehl, M. Cook, Unsupervised learning of digit recognition using spike-timing-dependent plasticity, Frontiers in Computational Neuroscience 9 (2015). URL: https://www.frontiersin.org/articles/10.3389/fncom.2015.00099. doi:10.3389/fncom.2015.00099.
- A Study On the Effects of Pre-processing On Spatio-temporal Action Recognition Using Spiking Neural Networks Trained with STDP, in: International Workshop on Content-based Multimedia Indexing (CBMI), 2021, pp. 1–6. URL: https://hal.archives-ouvertes.fr/hal-03263914.
- Improved two-stream model for human action recognition, EURASIP Journal on Image and Video Processing 2020 (2020). doi:10.1186/s13640-020-00501-x.
- G. Farnebäck, Two-Frame Motion Estimation Based on Polynomial Expansion, in: The 13th Scandinavian Conference on Image Analysis, 2003, pp. 363–370.
- A Selective Spatio-temporal Interest Point Detector for Human Action Recognition in Complex Scenes, in: International Conference on Computer Vision (ICCV), 2011, pp. 1776–1783. doi:10.1109/ICCV.2011.6126443.
- Human action recognition using 3D convolutional neural networks with 3D motion cuboids in surveillance videos, Procedia Computer Science 133 (2018) 471–477. doi:10.1016/j.procs.2018.07.059.
- 3d convolutional neural networks for human action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2013) 221–231. doi:10.1109/TPAMI.2012.59.
- R. Poppe, A Survey on Vision-Based Human Action Recognition, Image and Vision Computing 28 (2010) 976–990. doi:https://doi.org/10.1016/j.imavis.2009.11.014.
- Mireille El-Assal (3 papers)
- Pierre Tirilly (10 papers)
- Ioan Marius Bilasco (16 papers)