Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Supervised Bird's Eye View Motion Prediction with Cross-Modality Signals (2401.11499v1)

Published 21 Jan 2024 in cs.CV

Abstract: Learning the dense bird's eye view (BEV) motion flow in a self-supervised manner is an emerging research for robotics and autonomous driving. Current self-supervised methods mainly rely on point correspondences between point clouds, which may introduce the problems of fake flow and inconsistency, hindering the model's ability to learn accurate and realistic motion. In this paper, we introduce a novel cross-modality self-supervised training framework that effectively addresses these issues by leveraging multi-modality data to obtain supervision signals. We design three innovative supervision signals to preserve the inherent properties of scene motion, including the masked Chamfer distance loss, the piecewise rigidity loss, and the temporal consistency loss. Through extensive experiments, we demonstrate that our proposed self-supervised framework outperforms all previous self-supervision methods for the motion prediction task.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE transactions on pattern analysis and machine intelligence, 34(11): 2274–2282.
  2. Slim: Self-supervised lidar scene flow and motion segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 13126–13136.
  3. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11621–11631.
  4. Spagnn: Spatially-aware graph neural networks for relational behavior forecasting from sensor data. In 2020 IEEE International Conference on Robotics and Automation (ICRA), 9491–9497. IEEE.
  5. 3d point cloud processing and learning for autonomous driving: Impacting map creation, localization, and perception. IEEE Signal Processing Magazine, 38(1): 68–86.
  6. Bi-PointFlowNet: Bidirectional Learning for Point Cloud Based Scene Flow Estimation. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVIII, 108–124. Springer.
  7. Hidden gems: 4d radar scene flow learning using cross-modal supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9340–9349.
  8. Exploiting rigidity constraints for lidar scene flow estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12776–12785.
  9. TBP-Former: Learning Temporal Bird’s-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1368–1378.
  10. Any motion detector: Learning class-agnostic scene dynamics from a sequence of lidar point clouds. In 2020 IEEE international conference on robotics and automation (ICRA), 9498–9504. IEEE.
  11. Weakly supervised learning of rigid 3D scene flow. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5692–5703.
  12. Hplflownet: Hierarchical permutohedral lattice flownet for scene flow estimation on large-scale point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 3254–3263.
  13. Fiery: Future instance prediction in bird’s-eye view from surround monocular cameras. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 15273–15282.
  14. ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds. arXiv preprint arXiv:2304.12589.
  15. Scalable scene flow from point clouds in the real world. IEEE Robotics and Automation Letters, 7(2): 1589–1596.
  16. Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1116–1124.
  17. Flowstep3d: Model unrolling for self-supervised scene flow estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4114–4123.
  18. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 12697–12705.
  19. Pillarflow: End-to-end birds-eye-view flow estimation for autonomous driving. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2007–2013. IEEE.
  20. Learning scene flow in 3d point clouds with noisy pseudo labels. arXiv preprint arXiv:2203.12655.
  21. Hcrf-flow: Scene flow from point clouds with continuous high-order crfs and position-aware flow embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 364–373.
  22. Weakly Supervised Class-Agnostic Motion Prediction for Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 17599–17608.
  23. Rigidflow: Self-supervised scene flow learning on point clouds by local rigidity prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16959–16968.
  24. Neural scene flow prior. Advances in Neural Information Processing Systems, 34: 7838–7851.
  25. Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 17182–17191.
  26. Bevfusion: A simple and robust lidar-camera fusion framework. Advances in Neural Information Processing Systems, 35: 10421–10434.
  27. CamLiFlow: Bidirectional camera-LiDAR fusion for joint optical flow and scene flow estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5791–5801.
  28. Flownet3d: Learning scene flow in 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 529–537.
  29. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
  30. Self-supervised pillar motion learning for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3183–3192.
  31. Fast and furious: Real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 3569–3577.
  32. Just go with the flow: Self-supervised scene flow estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11177–11185.
  33. Deep multi-task learning for joint localization, perception, and prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4679–4689.
  34. Scene flow from point clouds with or without learning. In 2020 international conference on 3D vision (3DV), 261–270. IEEE.
  35. Flot: Scene flow on point clouds guided by optimal transport. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII, 527–544. Springer.
  36. DeepLiDARFlow: A deep learning architecture for scene flow estimation using monocular camera and sparse LiDAR. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 10460–10467. IEEE.
  37. Long-term occupancy grid prediction using recurrent neural networks. In 2019 International Conference on Robotics and Automation (ICRA), 9299–9305. IEEE.
  38. Self-Supervised 3D Scene Flow Estimation Guided by Superpoints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5271–5280.
  39. Pointrcnn: 3d object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 770–779.
  40. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE conference on computer vision and pattern recognition, 8934–8943.
  41. Raft: Recurrent all-pairs field transforms for optical flow. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, 402–419. Springer.
  42. Self-supervised learning of non-rigid residual flow and ego-motion. In 2020 international conference on 3D vision (3DV), 150–159. IEEE.
  43. Pointpainting: Sequential fusion for 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4604–4612.
  44. Be-sti: Spatial-temporal integrated network for class-agnostic motion prediction with bidirectional enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 17093–17102.
  45. Asynchrony-Robust Collaborative Perception via Bird’s Eye View Flow. In Thirty-seventh Conference on Neural Information Processing Systems.
  46. Argoverse 2: Next generation datasets for self-driving perception and forecasting. arXiv preprint arXiv:2301.00493.
  47. Identifying unknown instances for autonomous driving. In Conference on Robot Learning, 384–393. PMLR.
  48. Motionnet: Joint perception and motion prediction for autonomous driving based on bird’s eye view maps. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11385–11395.
  49. Pointpwc-net: A coarse-to-fine network for supervised and self-supervised scene flow estimation on 3d point clouds. arXiv preprint arXiv:1911.12408.
  50. Beverse: Unified perception and prediction in birds-eye-view for vision-centric autonomous driving. arXiv preprint arXiv:2205.09743.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com