Predictive Temporal Attention on Event-based Video Stream for Energy-efficient Situation Awareness (2402.08936v1)
Abstract: The Dynamic Vision Sensor (DVS) is an innovative technology that efficiently captures and encodes visual information in an event-driven manner. By combining it with event-driven neuromorphic processing, the sparsity in DVS camera output can result in high energy efficiency. However, similar to many embedded systems, the off-chip communication between the camera and processor presents a bottleneck in terms of power consumption. Inspired by the predictive coding model and expectation suppression phenomenon found in human brain, we propose a temporal attention mechanism to throttle the camera output and pay attention to it only when the visual events cannot be well predicted. The predictive attention not only reduces power consumption in the sensor-processor interface but also effectively decreases the computational workload by filtering out noisy events. We demonstrate that the predictive attention can reduce 46.7% of data communication between the camera and the processor and reduce 43.8% computation activities in the processor.
- M. F. Land, “Motion and vision: why animals move their eyes,” Journal of Comparative Physiology A, vol. 185, pp. 341–352, 1999.
- G. Gallego, T. Delbrück, G. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. J. Davison, J. Conradt, K. Daniilidis et al., “Event-based vision: A survey,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 1, pp. 154–180, 2020.
- M. Yao, H. Gao, G. Zhao, D. Wang, Y. Lin, Z. Yang, and G. Li, “Temporal-wise attention spiking neural networks for event streams classification,” CoRR, vol. abs/2107.11711, 2021. [Online]. Available: https://arxiv.org/abs/2107.11711
- D. Roy, P. Panda, and K. Roy, “Synthesizing images from spatio-temporal representations using spike-based backpropagation,” CoRR, vol. abs/1906.08861, 2019. [Online]. Available: http://arxiv.org/abs/1906.08861
- H. Kamata, Y. Mukuta, and T. Harada, “Fully spiking variational autoencoder,” CoRR, vol. abs/2110.00375, 2021. [Online]. Available: https://arxiv.org/abs/2110.00375
- Y. Hu, J. Binas, D. Neil, S. Liu, and T. Delbrück, “DDD20 end-to-end event camera driving dataset: Fusing frames and events with deep learning for improved steering prediction,” CoRR, vol. abs/2005.08605, 2020. [Online]. Available: https://arxiv.org/abs/2005.08605
- X. She, S. Dash, and S. Mukhopadhyay, “Sequence approximation using feedforward spiking neural network for spatiotemporal learning: Theory and optimization methods,” in The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. [Online]. Available: https://openreview.net/forum?id=bp-LJ4y_XC
- H. Liu, D. P. Moeys, G. Das, D. Neil, S.-C. Liu, and T. Delbrück, “Combined frame- and event-based detection and tracking,” in 2016 IEEE International Symposium on Circuits and Systems (ISCAS), 2016, pp. 2511–2514.
- D. Tedaldi, G. Gallego, E. Mueggler, and D. Scaramuzza, “Feature detection and tracking with the dynamic and active-pixel vision sensor (davis),” in 2016 Second International Conference on Event-based Control, Communication, and Signal Processing (EBCCSP), 2016, pp. 1–7.
- F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G.-J. Nam et al., “Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip,” IEEE transactions on computer-aided design of integrated circuits and systems, vol. 34, no. 10, pp. 1537–1557, 2015.
- M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain, Y. Liao, C.-K. Lin, A. Lines, R. Liu, D. Mathaikutty, S. McCoy, A. Paul, J. Tse, G. Venkataramanan, Y.-H. Weng, A. Wild, Y. Yang, and H. Wang, “Loihi: A neuromorphic manycore processor with on-chip learning,” IEEE Micro, vol. 38, no. 1, pp. 82–99, 2018.
- G. Orchard, E. P. Frady, D. B. D. Rubin, S. Sanborn, S. B. Shrestha, F. T. Sommer, and M. Davies, “Efficient neuromorphic signal processing with loihi 2,” CoRR, vol. abs/2111.03746, 2021. [Online]. Available: https://arxiv.org/abs/2111.03746
- M. Zaffar, S. Ehsan, R. Stolkin, and K. D. McDonald-Maier, “Sensors, SLAM and long-term autonomy: A review,” CoRR, vol. abs/1807.01605, 2018. [Online]. Available: http://arxiv.org/abs/1807.01605
- S. Lurye, “Surges in mobile energy consumption during USB charging and data exchange – Securelist,” 7 2016. [Online]. Available: https://securelist.com/surges-in-mobile-energy-consumption-during-usb-charging-and-data-exchange/75297/
- A. Clark, “Whatever next? predictive brains, situated agents, and the future of cognitive science,” Behavioral and brain sciences, vol. 36, no. 3, pp. 181–204, 2013.
- T. P. Lillicrap, A. Santoro, L. Marris, C. J. Akerman, and G. Hinton, “Backpropagation and the brain,” Nature Reviews Neuroscience, vol. 21, no. 6, pp. 335–346, 2020.
- Y. Huang and R. P. Rao, “Predictive coding,” Wiley Interdisciplinary Reviews: Cognitive Science, vol. 2, no. 5, pp. 580–593, 2011.
- K. S. Walsh and D. P. McGovern, “Expectation suppression dampens sensory representations of predicted stimuli,” Journal of Neuroscience, vol. 38, no. 50, pp. 10 592–10 594, 2018.
- Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
- Y. Zhou, H. Dong, and A. El Saddik, “Deep learning in next-frame prediction: A benchmark review,” IEEE Access, vol. 8, pp. 69 273–69 283, 2020.
- S. Oprea, P. Martinez-Gonzalez, A. Garcia-Garcia, J. A. Castro-Vargas, S. Orts-Escolano, J. Garcia-Rodriguez, and A. Argyros, “A review on deep learning techniques for video prediction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 6, pp. 2806–2826, 2022.
- C. Lee, A. K. Kosta, A. Z. Zhu, K. Chaney, K. Daniilidis, and K. Roy, “Spike-flownet: event-based optical flow estimation with energy-efficient hybrid neural networks,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX 16. Springer, 2020, pp. 366–382.
- W. Lotter, G. Kreiman, and D. D. Cox, “Deep predictive coding networks for video prediction and unsupervised learning,” CoRR, vol. abs/1605.08104, 2016. [Online]. Available: http://arxiv.org/abs/1605.08104
- Y. Wang, Z. Gao, M. Long, J. Wang, and P. S. Yu, “Predrnn++: Towards A resolution of the deep-in-time dilemma in spatiotemporal predictive learning,” CoRR, vol. abs/1804.06300, 2018. [Online]. Available: http://arxiv.org/abs/1804.06300
- R. Villegas, J. Yang, Y. Zou, S. Sohn, X. Lin, and H. Lee, “Learning to generate long-term future via hierarchical prediction,” CoRR, vol. abs/1704.05831, 2017. [Online]. Available: http://arxiv.org/abs/1704.05831
- J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” CoRR, vol. abs/1709.01507, 2017. [Online]. Available: http://arxiv.org/abs/1709.01507
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” CoRR, vol. abs/1706.03762, 2017. [Online]. Available: http://arxiv.org/abs/1706.03762
- S. Woo, J. Park, J. Lee, and I. S. Kweon, “CBAM: convolutional block attention module,” CoRR, vol. abs/1807.06521, 2018. [Online]. Available: http://arxiv.org/abs/1807.06521
- A. Papadopoulos, P. Korus, and N. D. Memon, “Hard-attention for scalable image classification,” CoRR, vol. abs/2102.10212, 2021. [Online]. Available: https://arxiv.org/abs/2102.10212
- I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” CoRR, vol. abs/1409.3215, 2014. [Online]. Available: http://arxiv.org/abs/1409.3215
- M. Cannici, M. Ciccone, A. Romanoni, and M. Matteucci, “Attention mechanisms for object recognition with event-based cameras,” CoRR, vol. abs/1807.09480, 2018. [Online]. Available: http://arxiv.org/abs/1807.09480
- X. Wang, J. Li, L. Zhu, Z. Zhang, Z. Chen, X. Li, Y. Wang, Y. Tian, and F. Wu, “Visevent: Reliable object tracking via collaboration of frame and event flows,” arXiv preprint arXiv:2108.05015, 2021.
- A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T. Nayak, A. Andreopoulos, G. Garreau, M. Mendoza et al., “A low power, fully event-based gesture recognition system,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7243–7252.