RadarMOSEVE: A Spatial-Temporal Transformer Network for Radar-Only Moving Object Segmentation and Ego-Velocity Estimation (2402.14380v1)
Abstract: Moving object segmentation (MOS) and Ego velocity estimation (EVE) are vital capabilities for mobile systems to achieve full autonomy. Several approaches have attempted to achieve MOSEVE using a LiDAR sensor. However, LiDAR sensors are typically expensive and susceptible to adverse weather conditions. Instead, millimeter-wave radar (MWR) has gained popularity in robotics and autonomous driving for real applications due to its cost-effectiveness and resilience to bad weather. Nonetheless, publicly available MOSEVE datasets and approaches using radar data are limited. Some existing methods adopt point convolutional networks from LiDAR-based approaches, ignoring the specific artifacts and the valuable radial velocity information of radar measurements, leading to suboptimal performance. In this paper, we propose a novel transformer network that effectively addresses the sparsity and noise issues and leverages the radial velocity measurements of radar points using our devised radar self- and cross-attention mechanisms. Based on that, our method achieves accurate EVE of the robot and performs MOS using only radar data simultaneously. To thoroughly evaluate the MOSEVE performance of our method, we annotated the radar points in the public View-of-Delft (VoD) dataset and additionally constructed a new radar dataset in various environments. The experimental results demonstrate the superiority of our approach over existing state-of-the-art methods. The code is available at https://github.com/ORCA-Uboat/RadarMOSEVE.
- Method for registration of 3-D shapes. In Sensor Fusion, volume 1611, 586–606.
- Precise ego-motion estimation with millimeter-wave radar under diverse and challenging conditions. In Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 6045–6052.
- Radar-only ego-motion estimation in difficult settings via graph matching. In Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 298–304.
- Moving object segmentation in 3D LiDAR data: A learning-based approach exploiting sequential data. IEEE Robotics and Automation Letters (RA-L), 6(4): 6529–6536.
- Automatic labeling to generate training data for online LiDAR-based moving object segmentation. IEEE Robotics and Automation Letters (RA-L), 7(3): 6107–6114.
- Segflow: Joint learning for video object segmentation and optical flow. In Proc. of the IEEE/CVF Intl. Conf. on Computer Vision (ICCV), 686–695.
- A Novel Radar Point Cloud Generation Method for Robot Environment Perception. IEEE Transactions on Robotics, 38(6): 3754–3773.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision. In Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 9340–9349.
- Self-Supervised Scene Flow Estimation With 4-D Automotive Radar. IEEE Robotics and Automation Letters (RA-L), 1–8.
- Flownet: Learning optical flow with convolutional networks. In Proc. of the IEEE/CVF Intl. Conf. on Computer Vision (ICCV), 2758–2766.
- Point transformer. IEEE Access, 9: 134826–134840.
- Unsupervised video object segmentation for deep reinforcement learning. Advances in neural information processing systems, 31.
- FlowVOS: Weakly-Supervised Visual Warping for Detail-Preserving and Temporally Consistent Single-Shot Video Object Segmentation. arXiv preprint arXiv:2111.10621.
- Semantics-Guided Moving Object Segmentation with 3D LiDAR. arXiv preprint arXiv:2205.03186.
- Pct: Point cloud transformer. Computational Visual Media, 7: 187–199.
- A credible and robust approach to ego-motion estimation using an automotive radar. IEEE Robotics and Automation Letters (RA-L), 7(3): 6020–6027.
- A system identification based oracle for control-cps software fault localization. In Proc. of IEEE/ACM Intl. Conf. on Software Engineering (ICSE), 116–127.
- EmPointMovSeg: Sparse Tensor-Based Moving-Object Segmentation in 3-D LiDAR Point Clouds for Autonomous Driving-Embedded System. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 42(1): 41–53.
- A fuzzy k-nearest neighbor algorithm. IEEE transactions on systems, man, and cybernetics, (4): 580–585.
- Instantaneous ego-motion estimation using doppler radar. In Proc. of the IEEE Intl. Conf. on Intelligent Transportation Systems (ITSC), 869–874.
- RVMOS: Range-View Moving Object Segmentation Leveraged by Semantic and Motion Features. IEEE Robotics and Automation Letters (RA-L), 7(3): 8044–8051.
- Stratified transformer for 3d point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8500–8509.
- Exploiting temporal relations on radar perception for autonomous driving. In Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 17071–17080.
- Sparse convolutional neural networks. In Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 806–814.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proc. of the IEEE/CVF Intl. Conf. on Computer Vision (ICCV), 10012–10022.
- See through smoke: robust indoor mapping with low-cost mmwave radar. In Proc. of Conf. on Mobile Systems, Applications, and Services, 14–27.
- milliEgo: single-chip mmWave radar aided egomotion estimation via deep sensor fusion. In Proc. of Conf. on Embedded Networked Sensor Systems, 109–122.
- Premvos: Proposal-generation, refinement and merging for video object segmentation. In Proc. of the Asian Conf. on Computer Vision (ACCV), 565–580.
- Receding moving object segmentation in 3d lidar data using sparse 4d convolutions. IEEE Robotics and Automation Letters (RA-L), 7(3): 7503–7510.
- Building Volumetric Beliefs for Dynamic Environments Exploiting Map-Based Moving Object Segmentation. IEEE Robotics and Automation Letters (RA-L), 8(8): 5180–5187.
- Fast marching farthest point sampling. Technical report, University of Cambridge, Computer Laboratory.
- Radarodo: Ego-motion estimation from doppler and spatial data in radar images. IEEE Transactions on Intelligent Vehicles, 5(3): 475–484.
- Multi-class road user detection with 3+ 1D radar in the View-of-Delft dataset. IEEE Robotics and Automation Letters (RA-L), 7(2): 4961–4968.
- 3d ego-motion estimation using low-cost mmwave radars via radar velocity factor for pose-graph slam. IEEE Robotics and Automation Letters (RA-L), 6(4): 7691–7698.
- An end-to-end edge aggregation network for moving object segmentation. In Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 8149–8158.
- An unified recurrent video object segmentation framework for various surveillance environments. IEEE Transactions on Image Processing, 30: 7889–7902.
- Ego-motion estimation using distributed single-channel radar sensors. In Proc. of Intl. Conf. on Microwaves for Intelligent Mobility (ICMIM), 1–4.
- Efficient Spatial-Temporal Information Fusion for LiDAR-Based 3D Moving Object Segmentation. In Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 11456–11463.
- MIMO radar for advanced driver-assistance systems and autonomous driving: Advantages and challenges. IEEE Signal Processing Magazine, 37(4): 98–117.
- Attention is all you need. Advances in neural information processing systems, 30.
- Feelvos: Fast end-to-end embedding learning for video object segmentation. In Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 9481–9490.
- InsMOS: Instance-Aware Moving Object Segmentation in LiDAR Data. In Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS).
- Learning unsupervised video object segmentation through visual attention. In Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 3064–3074.
- Unsupervised moving object detection via contextual information separation. In Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 879–888.
- Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19313–19322.
- Gaussian Radar Transformer for Semantic Segmentation in Noisy Radar Data. IEEE Robotics and Automation Letters (RA-L), 8(1): 344–351.
- Patchformer: An efficient point transformer with patch attention. In Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 11799–11808.
- Point transformer. In Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 16259–16268.