Neuromorphic Vision-based Motion Segmentation with Graph Transformer Neural Network (2404.10940v1)
Abstract: Moving object segmentation is critical to interpret scene dynamics for robotic navigation systems in challenging environments. Neuromorphic vision sensors are tailored for motion perception due to their asynchronous nature, high temporal resolution, and reduced power consumption. However, their unconventional output requires novel perception paradigms to leverage their spatially sparse and temporally dense nature. In this work, we propose a novel event-based motion segmentation algorithm using a Graph Transformer Neural Network, dubbed GTNN. Our proposed algorithm processes event streams as 3D graphs by a series of nonlinear transformations to unveil local and global spatiotemporal correlations between events. Based on these correlations, events belonging to moving objects are segmented from the background without prior knowledge of the dynamic scene geometry. The algorithm is trained on publicly available datasets including MOD, EV-IMO, and \textcolor{black}{EV-IMO2} using the proposed training scheme to facilitate efficient training on extensive datasets. Moreover, we introduce the Dynamic Object Mask-aware Event Labeling (DOMEL) approach for generating approximate ground-truth labels for event-based motion segmentation datasets. We use DOMEL to label our own recorded Event dataset for Motion Segmentation (EMS-DOMEL), which we release to the public for further research and benchmarking. Rigorous experiments are conducted on several unseen publicly-available datasets where the results revealed that GTNN outperforms state-of-the-art methods in the presence of dynamic background variations, motion patterns, and multiple dynamic objects with varying sizes and velocities. GTNN achieves significant performance gains with an average increase of 9.4% and 4.5% in terms of motion segmentation accuracy (IoU%) and detection rate (DR%), respectively.
- G. N. DeSouza and A. C. Kak, “Vision for mobile robot navigation: A survey,” IEEE TPAMI, vol. 24, no. 2, pp. 237–267, 2002.
- Y. Alkendi, L. Seneviratne, and Y. Zweiri, “State of the art in vision-based localization techniques for autonomous navigation systems,” IEEE Access, vol. 9, pp. 76 847–76 874, 2021.
- I. Lluvia, E. Lazkano, and A. Ansuategi, “Active mapping and robot exploration: A survey,” Sensors, vol. 21, no. 7, p. 2445, 2021.
- H.-J. Liang, N. J. Sanket, C. Fermüller, and Y. Aloimonos, “Salientdso: Bringing attention to direct sparse odometry,” IEEE TASE, vol. 16, no. 4, pp. 1619–1626, 2019.
- T. Rateke and A. von Wangenheim, “Passive vision road obstacle detection: a literature mapping,” International Journal of Computers and Applications, vol. 44, no. 4, pp. 376–395, 2022.
- A. Beghdadi and M. Mallem, “A comprehensive overview of dynamic visual slam and deep learning: concepts, methods and challenges,” Machine Vision and Applications, vol. 33, no. 4, pp. 1–28, 2022.
- Y. Bi, B. Xue, P. Mesejo, S. Cagnoni, and M. Zhang, “A survey on evolutionary computation for computer vision and image analysis: Past, present, and future trends,” IEEE Transactions on Evolutionary Computation, 2022.
- B. Emek Soylu, M. S. Guzel, G. E. Bostanci, F. Ekinci, T. Asuroglu, and K. Acici, “Deep-learning-based approaches for semantic segmentation of natural scene images: A review,” Electronics, vol. 12, no. 12, 2023. [Online]. Available: https://www.mdpi.com/2079-9292/12/12/2730
- W. Wang, X. Lu, J. Shen, D. J. Crandall, and L. Shao, “Zero-shot video object segmentation via attentive graph neural networks,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9236–9245.
- X. Lu, W. Wang, M. Danelljan, T. Zhou, J. Shen, and L. Van Gool, “Video object segmentation with episodic graph memory networks,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer, 2020, pp. 661–679.
- J. Mattheus, H. Grobler, and A. M. Abu-Mahfouz, “A review of motion segmentation: Approaches and major challenges,” in 2020 2nd International Multidisciplinary Information Technology and Engineering Conference (IMITEC), 2020, pp. 1–8.
- T. Zhou, F. Porikli, D. J. Crandall, L. Van Gool, and W. Wang, “A survey on deep learning technique for video segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 6, pp. 7099–7122, 2022.
- J. Mattheus, H. Grobler, and A. M. Abu-Mahfouz, “A review of motion segmentation: Approaches and major challenges,” in 2020 2nd IMITEC. IEEE, 2020, pp. 1–8.
- K. Jo, S. Lee, C. Kim, and M. Sunwoo, “Rapid motion segmentation of lidar point cloud based on a combination of probabilistic and evidential approaches for intelligent vehicles,” Sensors, vol. 19, no. 19, p. 4116, 2019.
- A. Mitrokhin, C. Ye, C. Fermüller, Y. Aloimonos, and T. Delbruck, “Ev-imo: Motion segmentation dataset and learning pipeline for event cameras,” in 2019 IEEE/RSJ IROS, 2019, pp. 6105–6112.
- G. Gallego, T. Delbruck, G. M. Orchard, C. Bartolozzi, B. Taba, A. Censi, S. Leutenegger, A. Davison, J. Conradt, K. Daniilidis, and D. Scaramuzza, “Event-based vision: A survey,” IEEE TPAMI, pp. 1–1, 2020.
- A. R. Vidal, H. Rebecq, T. Horstschaefer, and D. Scaramuzza, “Ultimate slam? combining events, images, and IMU for robust visual SLAM in HDR and high-speed scenarios,” IEEE RAL, vol. 3, no. 2, pp. 994–1001, 2018.
- M. Salah, M. Chehadah, M. Humais, M. Wahbah, A. Ayyad, R. Azzam, L. Seneviratne, and Y. Zweiri, “A neuromorphic vision-based measurement for robust relative localization in future space exploration missions,” IEEE TIM, 2022.
- J. Furmonas, J. Liobe, and V. Barzdenas, “Analytical review of event-based camera depth estimation methods and systems,” Sensors, vol. 22, no. 3, p. 1201, 2022.
- R. Jiang, X. Mou, S. Shi, Y. Zhou, Q. Wang, M. Dong, and S. Chen, “Object tracking on event cameras with offline–online learning,” CAAI Transactions on Intelligence Technology, vol. 5, no. 3, pp. 165–171, 2020.
- A. Ayyad, M. Halwani, D. Swart, R. Muthusamy, F. Almaskari, and Y. Zweiri, “Neuromorphic vision based control for the precise positioning of robotic drilling systems,” Robotics and Computer-Integrated Manufacturing, vol. 79, p. 102419, 2023.
- F. L. Macdonald, N. F. Lepora, J. Conradt, and B. Ward-Cherrier, “Neuromorphic tactile edge orientation classification in an unsupervised spiking neural network,” Sensors, vol. 22, no. 18, p. 6998, 2022.
- T. Stoffregen, G. Gallego, T. Drummond, L. Kleeman, and D. Scaramuzza, “Event-based motion segmentation by motion compensation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7244–7253.
- Y. Zhou, G. Gallego, X. Lu, S. Liu, and S. Shen, “Event-based motion segmentation with spatio-temporal graph cuts,” IEEE TNNLS, 2021.
- C. M. Parameshwara, N. J. Sanket, C. D. Singh, C. Fermüller, and Y. Aloimonos, “0-mms: Zero-shot multi-motion segmentation with a monocular event camera,” in 2021 IEEE ICRA, 2021, pp. 9594–9600.
- C. M. Parameshwara, S. Li, C. Fermüller, N. J. Sanket, M. S. Evanusa, and Y. Aloimonos, “Spikems: Deep spiking neural network for motion segmentation,” in 2021 IEEE/RSJ IROS, 2021, pp. 3414–3420.
- A. Mitrokhin, Z. Hua, C. Fermuller, and Y. Aloimonos, “Learning visual motion segmentation using event surfaces,” in Proceedings of the IEEE/CVF Conference on CVPR, 2020, pp. 14 414–14 423.
- H. Zhao, L. Jiang, J. Jia, P. H. Torr, and V. Koltun, “Point transformer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 16 259–16 268.
- Y. Alkendi, R. Azzam, A. Ayyad, S. Javed, L. Seneviratne, and Y. Zweiri, “Neuromorphic camera denoising using graph neural network-driven transformers,” IEEE TNNLS, pp. 1–15, 2022.
- M. Gehrig and D. Scaramuzza, “Recurrent vision transformers for object detection with event cameras,” arXiv preprint arXiv:2212.05598, 2022.
- A. Sabater, L. Montesano, and A. C. Murillo, “Event transformer. a sparse-aware solution for efficient event data processing,” in Proceedings of the IEEE/CVF Conference on CVPR, 2022, pp. 2677–2686.
- L. Burner, A. Mitrokhin, C. Fermüller, and Y. Aloimonos, “Evimo2: An event camera dataset for motion segmentation, optical flow, structure from motion, and visual inertial odometry in indoor scenes with monocular or stereo algorithms,” arXiv preprint arXiv:2205.03467, 2022.
- A. Mitrokhin, C. Fermuller, C. Parameshwara, and Y. Aloimonos, “Event-based moving object detection and tracking,” 2018 IEEE/RSJ IROS, Oct 2018. [Online]. Available: http://dx.doi.org/10.1109/IROS.2018.8593805
- F. Barranco, C. Fermuller, and E. Ros, “Real-time clustering and multi-target tracking using event-based sensors,” in 2018 IEEE/RSJ IROS. IEEE, 2018, pp. 5764–5769.
- A. Linares-Barranco, F. Gómez-Rodríguez, V. Villanueva, L. Longinotti, and T. Delbrück, “A usb3.0 fpga event-based filtering and tracking framework for dynamic vision sensors,” in 2015 IEEE ISCAS, 2015, pp. 2417–2420.
- A. Mishra, R. Ghosh, J. C. Principe, N. V. Thakor, and S. L. Kukreja, “A Saccade Based Framework for Real-Time Motion Segmentation Using Event Based Vision Sensors,” Frontiers in Neuroscience, vol. 11, p. 83, 2017. [Online]. Available: https://www.frontiersin.org/article/10.3389/fnins.2017.00083
- A. Ogale, C. Fermuller, and Y. Aloimonos, “Motion segmentation using occlusions,” IEEE TPAMI, vol. 27, no. 06, pp. 988–992, jun 2005.
- G. Gallego, M. Gehrig, and D. Scaramuzza, “Focus is all you need: Loss functions for event-based vision,” in Proceedings of the IEEE/CVF Conference on CVPR, 2019, pp. 12 280–12 289.
- T. Stoffregen and L. Kleeman, “Event cameras, contrast maximization and reward functions: An analysis,” in 2019 IEEE/CVF Conference on CVPR (CVPR), 2019, pp. 12 292–12 300.
- S. Xiao, S. Wang, Y. Dai, and W. Guo, “Graph neural networks in node classification: survey and evaluation,” Machine Vision and Applications, vol. 33, no. 1, pp. 1–19, 2022.
- J. H. Giraldo, S. Javed, and T. Bouwmans, “Graph moving object segmentation,” IEEE TPAMI, 2020.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
- K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu et al., “A survey on visual transformer,” arXiv preprint arXiv:2012.12556, vol. 2, no. 4, 2020.
- D. Lin, Y. Li, Y. Cheng, S. Prasad, A. Guo, and Y. Cao, “Multi-range view aggregation network with vision transformer feature fusion for 3d object retrieval,” IEEE Transactions on Multimedia, pp. 1–12, 2023.
- J. Jiao, Y.-M. Tang, K.-Y. Lin, Y. Gao, J. Ma, Y. Wang, and W.-S. Zheng, “Dilateformer: Multi-scale dilated transformer for visual recognition,” IEEE Transactions on Multimedia, pp. 1–14, 2023.
- X. Lin, S. Sun, W. Huang, B. Sheng, P. Li, and D. D. Feng, “Eapt: Efficient attention pyramid transformer for image processing,” IEEE Transactions on Multimedia, vol. 25, pp. 50–61, 2023.
- Q. Xu, J. Wang, B. Jiang, and B. Luo, “Fine-grained visual classification via internal ensemble learning transformer,” IEEE Transactions on Multimedia, pp. 1–14, 2023.
- M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R. R. Martin, and S.-M. Hu, “Pct: Point cloud transformer,” Computational Visual Media, vol. 7, no. 2, pp. 187–199, 2021.
- X. Yu, L. Tang, Y. Rao, T. Huang, J. Zhou, and J. Lu, “Point-bert: Pre-training 3d point cloud transformers with masked point modeling,” in Proceedings of the IEEE/CVF Conference on CVPR, 2022, pp. 19 313–19 322.
- X.-F. Han, Y.-F. Jin, H.-X. Cheng, and G.-Q. Xiao, “Dual transformer for point cloud analysis,” IEEE Transactions on Multimedia, 2022.
- H. Zhao, J. Jia, and V. Koltun, “Exploring self-attention for image recognition,” in Proceedings of the IEEE/CVF Conference on CVPR, 2020, pp. 10 076–10 085.
- B. Pu, H. Kim, X.-D. Cai, B. Sen, C. Sui, and J. Fan, “Training set optimization in an artificial neural network constructed for high bandwidth interconnects design,” IEEE T-MTT, vol. 70, no. 6, pp. 2955–2964, 2022.
- I. Valova, C. Harris, T. Mai, and N. Gueorguieva, “Optimization of convolutional neural networks for imbalanced set classification,” Procedia Computer Science, vol. 176, pp. 660–669, 2020, knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 24th International Conference KES2020.
- “Robot operating system.” [Online]. Available: https://www.ros.org/
- P. Bergström and O. Edlund, “Robust registration of point sets using iteratively reweighted least squares,” Computational optimization and applications, vol. 58, no. 3, pp. 543–561, 2014.
- J. Canny, “A computational approach to edge detection,” IEEE TPAMI, no. 6, pp. 679–698, 1986.
- “Image segmenter app.” [Online]. Available: https://www.mathworks.com/help/images/ref/imagesegmenter-app.html
- M. S. Hossain, J. M. Betts, and A. P. Paplinski, “Dual focal loss to address class imbalance in semantic segmentation,” Neurocomputing, vol. 462, pp. 69–87, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231221011310
- J. Kaur and W. Singh, “Tools, techniques, datasets and application areas for object detection in an image: a review,” Multimedia Tools and Applications, pp. 1–55, 2022.
- P. Kirkland, D. Manna, A. Vicente, and G. Di Caterina, “Unsupervised spiking instance segmentation on event data using stdp features,” IEEE Transactions on Computers, vol. 71, no. 11, pp. 2728–2739, 2022.
- C. Liu, X. Qi, E. Y. Lam, and N. Wong, “Fast classification and action recognition with event-based imaging,” IEEE Access, vol. 10, pp. 55 638–55 649, 2022.
- J. Hidalgo-Carrió, G. Gallego, and D. Scaramuzza, “Event-aided direct sparse odometry,” in Proceedings of the IEEE/CVF Conference on CVPR, 2022, pp. 5781–5790.
- S. Kachole, Y. Alkendi, F. B. Naeini, D. Makris, and Y. Zweiri, “Asynchronous events-based panoptic segmentation using graph mixer neural network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4082–4091.
- L. Wang and K.-J. Yoon, “Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks,” IEEE TPAMI, 2021.