Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeFlow: Decoder of Scene Flow Network in Autonomous Driving (2401.16122v1)

Published 29 Jan 2024 in cs.CV and cs.RO

Abstract: Scene flow estimation determines a scene's 3D motion field, by predicting the motion of points in the scene, especially for aiding tasks in autonomous driving. Many networks with large-scale point clouds as input use voxelization to create a pseudo-image for real-time running. However, the voxelization process often results in the loss of point-specific features. This gives rise to a challenge in recovering those features for scene flow tasks. Our paper introduces DeFlow which enables a transition from voxel-based features to point features using Gated Recurrent Unit (GRU) refinement. To further enhance scene flow estimation performance, we formulate a novel loss function that accounts for the data imbalance between static and dynamic points. Evaluations on the Argoverse 2 scene flow task reveal that DeFlow achieves state-of-the-art results on large-scale point cloud data, demonstrating that our network has better performance and efficiency compared to others. The code is open-sourced at https://github.com/KTH-RPL/deflow.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. C. Luo, X. Yang, and A. Yuille, “Self-supervised pillar motion learning for autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3183–3192.
  2. M. Najibi, J. Ji, Y. Zhou, C. R. Qi, X. Yan, S. Ettinger, and D. Anguelov, “Motion inspired unsupervised perception and prediction in autonomous driving,” in European Conference on Computer Vision.   Springer, 2022, pp. 424–443.
  3. L. Schmid, O. Andersson, A. Sulser, P. Pfreundschuh, and R. Siegwart, “Dynablox: Real-time detection of diverse dynamic objects in complex environments,” arXiv preprint arXiv:2304.10049, 2023.
  4. I. Lang, D. Aiger, F. Cole, S. Avidan, and M. Rubinstein, “Scoop: Self-supervised correspondence and optimization-based scene flow,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5281–5290.
  5. Y. Wei, Z. Wang, Y. Rao, J. Lu, and J. Zhou, “PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds,” in CVPR, 2021.
  6. Z. Wang, Y. Wei, Y. Rao, J. Zhou, and J. Lu, “3d point-voxel correlation fields for scene flow estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  7. A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su et al., “Shapenet: An information-rich 3d model repository,” arXiv preprint arXiv:1512.03012, 2015.
  8. N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox, “A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4040–4048.
  9. P. Jund, C. Sweeney, N. Abdo, Z. Chen, and J. Shlens, “Scalable scene flow from point clouds in the real world,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1589–1596, 2021.
  10. M. Menze and A. Geiger, “Object scene flow for autonomous vehicles,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3061–3070.
  11. B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. Pontes, D. Ramanan, P. Carr, and J. Hays, “Argoverse 2: Next generation datasets for self-driving perception and forecasting,” in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks 2021), 2021.
  12. X. Li, J. Kaesemodel Pontes, and S. Lucey, “Neural scene flow prior,” Advances in Neural Information Processing Systems, vol. 34, pp. 7838–7851, 2021.
  13. N. Chodosh, D. Ramanan, and S. Lucey, “Re-evaluating lidar scene flow for autonomous driving,” arXiv preprint arXiv:2304.02150, 2023.
  14. K. Vedder, N. Peri, N. Chodosh, I. Khatri, E. Eaton, D. Jayaraman, Y. Liu, D. Ramanan, and J. Hays, “Zeroflow: Fast zero label scene flow via distillation,” arXiv preprint arXiv:2305.10424, 2023.
  15. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.
  16. Y. Zhou, P. Sun, Y. Zhang, D. Anguelov, J. Gao, T. Ouyang, J. Guo, J. Ngiam, and V. Vasudevan, “End-to-end multi-view fusion for 3d object detection in lidar point clouds,” in Conference on Robot Learning.   PMLR, 2020, pp. 923–932.
  17. A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12 697–12 705.
  18. S. Vedula, P. Rander, R. Collins, and T. Kanade, “Three-dimensional scene flow,” IEEE transactions on pattern analysis and machine intelligence, vol. 27, no. 3, pp. 475–480, 2005.
  19. Z. Lu and M. Cheng, “Gma3d: Local-global attention learning to estimate occluded motions of scene flow,” arXiv preprint arXiv:2210.03296, 2022.
  20. W. Cheng and J. H. Ko, “Bi-pointflownet: Bidirectional learning for point cloud based scene flow estimation,” in European Conference on Computer Vision.   Springer, 2022, pp. 108–124.
  21. Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16.   Springer, 2020, pp. 402–419.
  22. H. Xu, J. Zhang, J. Cai, H. Rezatofighi, and D. Tao, “Gmflow: Learning optical flow via global matching,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8121–8130.
  23. X. Sui, S. Li, X. Geng, Y. Wu, X. Xu, Y. Liu, R. Goh, and H. Zhu, “Craft: Cross-attentional flow transformer for robust optical flow,” in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2022, pp. 17 602–17 611.
  24. Y. Hou, X. Zhu, Y. Ma, C. C. Loy, and Y. Li, “Point-to-voxel knowledge distillation for lidar semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 8479–8488.
  25. J. Cen, P. Yun, S. Zhang, J. Cai, D. Luan, M. Tang, M. Liu, and M. Yu Wang, “Open-world semantic segmentation for lidar point clouds,” in Computer Vision – ECCV 2022.   Cham: Springer Nature Switzerland, 2022, pp. 318–334.
  26. A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. Van Der Smagt, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2758–2766.
  27. D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8934–8943.
  28. F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
  29. J. Hur and S. Roth, “Iterative residual refinement for joint optical flow and occlusion estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5754–5763.
  30. F. Zhang, O. J. Woodford, V. A. Prisacariu, and P. H. Torr, “Separable flow: Learning motion cost volumes for optical flow estimation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10 807–10 817.
  31. P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine et al., “Scalability in perception for autonomous driving: Waymo open dataset,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454.
  32. S. Lee, H. Lim, and H. Myung, “Patchwork++: Fast and robust ground segmentation solving partial under-segmentation using 3D point cloud,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2022, pp. 13 276–13 283.
  33. X. Li, J. Zheng, F. Ferroni, J. K. Pontes, and S. Lucey, “Fast neural scene flow,” arXiv preprint arXiv:2304.09121, 2023.
  34. A. 2, “Argoverse 2 scene flow online leaderboard,” https://eval.ai/web/challenges/challenge-page/2010/leaderboard/4759, 2023 Sep 14th.
  35. Q. Zhang, D. Duberg, R. Geng, M. Jia, L. Wang, and P. Jensfelt, “A dynamic points removal benchmark in point cloud maps,” arXiv preprint arXiv:2307.07260, 2023.
Citations (6)

Summary

  • The paper presents a novel GRU-based framework that refines point-level features to significantly improve scene flow estimation in large-scale point clouds.
  • The methodology incorporates a new loss function to address data imbalance, achieving state-of-the-art accuracy in dynamic point motion.
  • Empirical results on the Argoverse 2 dataset show reduced Endpoint Error, enhanced efficiency, and practical benefits for autonomous driving systems.

DeFlow: Decoder of Scene Flow Network in Autonomous Driving

The paper entitled "DeFlow: Decoder of Scene Flow Network in Autonomous Driving" presents a novel framework for improving scene flow estimation in large-scale point cloud data crucial for autonomous driving systems. Scene flow estimation, which determines the motion field within a 3D scene, significantly enhances the ability of autonomous vehicles to navigate dynamic environments, a challenge that the authors of this paper have approached with measurable success.

Methodological Innovations

DeFlow offers a pioneering transition from voxel-based features to more granular point-level features by integrating a Gated Recurrent Unit (GRU) refinement module. This innovation addresses the traditional challenge faced by methods reliant on large-scale point cloud data, where voxelization may inadvertently cause loss of point-specific features essential for reliable scene flow estimation. DeFlow’s architecture benefits from the GRU refinement, which iteratively processes and constructs detailed point-level features from voxel embeddings, thereby enhancing the feature differentiation among points that reside within the same voxel structure.

Furthermore, the authors propose a new type of loss function, explicitly designed to mitigate data imbalance inherent between static and dynamic points in a scene. Their empirical results clearly demonstrate how this novel loss function, when applied across training datasets, positively impacts the precise estimation of dynamic point motion—a critical factor for effective autonomous navigation.

Empirical Evaluation

DeFlow's efficiency and performance were evaluated using the Argoverse 2 scene flow task. The results were compelling, showcasing the state-of-the-art performance of DeFlow on large-scale point cloud datasets. The authors report significant reductions in the Endpoint Error (EPE), with particular improvements noted in dynamic point accuracy, a key metric for autonomous vehicles interpreting real-world scenes.

Comparative analyses with existing methodologies reveal that DeFlow not only advances the state of the art in terms of accuracy but also optimizes computational efficiency. Through their experiments, the authors highlight that DeFlow achieves superior performance with reduced GPU memory consumption and higher processing speeds compared to alternatives like FastFlow3D.

Implications and Future Directions

The implications of DeFlow are two-fold: practical and theoretical. Practically, DeFlow demonstrates its potential integration into real-time processing systems within autonomous vehicles, due to its efficient handling of large-scale data coupled with superior performance metrics. With real-time capability being a stringent requirement for autonomous driving applications, DeFlow represents a notable advancement.

Theoretically, the paper opens future research avenues in scene flow estimation. The promising results shown by integrating GRU with voxel-to-point transition suggest potential enhancements in network architectures for other 3D motion understanding tasks. Another fertile ground for future investigation emanates from the proposed loss function, which could be adapted or further refined for various data imbalance challenges.

In conclusion, the DeFlow technique offers a substantial contribution to the field of autonomous driving. While the paper primarily addresses the computational challenges associated with large-scale 3D point clouds, DeFlow also underscores the importance of targeted network architecture refinements and loss function designs, ultimately setting a new benchmark in scene flow estimation. Future work could explore scalability, potential for self-supervised learning, and incorporation of additional sensory modalities to further leverage DeFlow's foundational framework.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com