Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Finding Visual Saliency in Continuous Spike Stream (2403.06233v1)

Published 10 Mar 2024 in cs.CV

Abstract: As a bio-inspired vision sensor, the spike camera emulates the operational principles of the fovea, a compact retinal region, by employing spike discharges to encode the accumulation of per-pixel luminance intensity. Leveraging its high temporal resolution and bio-inspired neuromorphic design, the spike camera holds significant promise for advancing computer vision applications. Saliency detection mimics the behavior of human beings and captures the most salient region from the scenes. In this paper, we investigate the visual saliency in the continuous spike stream for the first time. To effectively process the binary spike stream, we propose a Recurrent Spiking Transformer (RST) framework, which is based on a full spiking neural network. Our framework enables the extraction of spatio-temporal features from the continuous spatio-temporal spike stream while maintaining low power consumption. To facilitate the training and validation of our proposed model, we build a comprehensive real-world spike-based visual saliency dataset, enriched with numerous light conditions. Extensive experiments demonstrate the superior performance of our Recurrent Spiking Transformer framework in comparison to other spike neural network-based methods. Our framework exhibits a substantial margin of improvement in capturing and highlighting visual saliency in the spike stream, which not only provides a new perspective for spike-based saliency segmentation but also shows a new paradigm for full SNN-based transformer models. The code and dataset are available at \url{https://github.com/BIT-Vision/SVS}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Frequency-tuned salient region detection. In 2009 IEEE conference on computer vision and pattern recognition, 1597–1604. IEEE.
  2. Salient object detection: A benchmark. IEEE transactions on image processing, 24(12): 5706–5722.
  3. Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Transactions on Image Processing, 22(1): 55–69.
  4. A 240×\times× 180 130 db 3 μ𝜇\muitalic_μs latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-State Circuits, 49(10): 2333–2341.
  5. Global context-aware progressive aggregation network for salient object detection. In Proceedings of the AAAI conference on artificial intelligence, volume 34, 10599–10606.
  6. A biomorphic digital image sensor. IEEE journal of solid-state circuits, 38(2): 281–294.
  7. A tutorial on the cross-entropy method. Annals of operations research, 134: 19–67.
  8. Spike Camera and Its Coding Methods. In 2017 Data Compression Conference (DCC).
  9. An Efficient Coding Method for Spike Camera Using Inter-Spike Intervals. In 2019 Data Compression Conference (DCC), 568–568. IEEE.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
  11. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE international conference on computer vision, 4548–4557.
  12. Shifting more attention to video salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8554–8564.
  13. Deep residual learning in spiking neural networks. Advances in Neural Information Processing Systems, 34: 21056–21069.
  14. Siamese network for RGB-D salient object detection and beyond. IEEE transactions on pattern analysis and machine intelligence, 44(9): 5541–5559.
  15. Neuronal dynamics: From single neurons to networks and models of cognition. Cambridge University Press.
  16. Reliable Event Generation with Invertible Conditional Normalizing Flow. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  17. Self-supervised learning of event-based optical flow with spiking neural networks. Advances in Neural Information Processing Systems, 34: 7167–7179.
  18. Optical flow estimation for spiking camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 17844–17853.
  19. Calibrated RGB-D salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 9471–9481.
  20. Spiking-yolo: spiking neural network for energy-efficient object detection. In Proceedings of the AAAI conference on artificial intelligence, volume 34, 11270–11277.
  21. Beyond classification: Directly training spiking neural networks for semantic segmentation. Neuromorphic Computing and Engineering, 2(4): 044015.
  22. Spike-flownet: event-based optical flow estimation with energy-efficient hybrid neural networks. In European Conference on Computer Vision, 366–382. Springer.
  23. A 128 ×\times× 128 120 dB 15 μ𝜇\muitalic_μs latency asynchronous temporal contrast vision sensor. IEEE journal of solid-state circuits, 43(2): 566–576.
  24. EV-IMO: Motion segmentation dataset and learning pipeline for event cameras. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 6105–6112. IEEE.
  25. Optimizing intersection-over-union in deep neural networks for image segmentation. In International symposium on visual computing, 234–244. Springer.
  26. High speed and high dynamic range video with an event camera. IEEE transactions on pattern analysis and machine intelligence, 43(6): 1964–1980.
  27. A 128×128128128128\times 128128 × 128 1.5% Contrast Sensitivity 0.9% FPN 3 μ𝜇\muitalic_μs Latency 4 mW Asynchronous Frame-Free Dynamic Vision Sensor Using Transimpedance Preamplifiers. IEEE Journal of Solid-State Circuits, 48(3): 827–838.
  28. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Advances in neural information processing systems, 28.
  29. Arbitrated time-to-first spike CMOS image sensor with on-chip histogram equalization. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 15(3): 346–357.
  30. Cellular and circuit mechanisms shaping the perceptual properties of the primate fovea. Cell, 168(3): 413–426.
  31. A unified transformer framework for group-based segmentation: Co-segmentation, co-saliency detection and video salient object detection. IEEE Transactions on Multimedia.
  32. Salient object detection in the deep learning era: An in-depth survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6): 3239–3259.
  33. Hardvs: Revisiting human activity recognition with dynamic vision sensors. arXiv preprint arXiv:2211.09648.
  34. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4): 600–612.
  35. A mutual learning method for salient object detection with intertwined multi-supervision. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8150–8159.
  36. Learning super-resolution reconstruction for high temporal resolution spike stream. IEEE Transactions on Circuits and Systems for Video Technology.
  37. Semi-supervised video salient object detection using pseudo-labels. In Proceedings of the IEEE/CVF international conference on computer vision, 7284–7293.
  38. Deepacg: Co-saliency detection via semantic-aware contrast gromov-wasserstein distance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13703–13712.
  39. Dynamic context-sensitive filtering network for video salient object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 1553–1563.
  40. Learning optical flow from continuous spike streams. Advances in Neural Information Processing Systems, 35: 7905–7920.
  41. Spikformer: When spiking neural network meets transformer. arXiv preprint arXiv:2209.15425.
  42. A retina-inspired sampling method for visual texture reconstruction. In 2019 IEEE International Conference on Multimedia and Expo (ICME), 1432–1437. IEEE.
  43. Hybrid coding of spatiotemporal spike data for a bio-inspired camera. IEEE Transactions on Circuits and Systems for Video Technology, 31(7): 2837–2851.
  44. Ultra-high temporal resolution visual reconstruction from a fovea-like spike camera via spiking neuron model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1): 1233–1249.
  45. Event-based video reconstruction via potential-assisted spiking neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3594–3604.
  46. Recurrent Spike-based Image Restoration under General Illumination. In Proceedings of the 31st ACM International Conference on Multimedia, 8251–8260.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com