ATPPNet: Attention based Temporal Point cloud Prediction Network (2401.17399v1)
Abstract: Point cloud prediction is an important yet challenging task in the field of autonomous driving. The goal is to predict future point cloud sequences that maintain object structures while accurately representing their temporal motion. These predicted point clouds help in other subsequent tasks like object trajectory estimation for collision avoidance or estimating locations with the least odometry drift. In this work, we present ATPPNet, a novel architecture that predicts future point cloud sequences given a sequence of previous time step point clouds obtained with LiDAR sensor. ATPPNet leverages Conv-LSTM along with channel-wise and spatial attention dually complemented by a 3D-CNN branch for extracting an enhanced spatio-temporal context to recover high quality fidel predictions of future point clouds. We conduct extensive experiments on publicly available datasets and report impressive performance outperforming the existing methods. We also conduct a thorough ablative study of the proposed architecture and provide an application study that highlights the potential of our model for tasks like odometry estimation.
- H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631.
- A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 3354–3361.
- X. Chen, I. Vizzo, T. Läbe, J. Behley, and C. Stachniss, “Range image-based lidar localization for autonomous vehicles,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 5802–5808.
- T. Shan and B. Englot, “Lego-loam: Lightweight and ground-optimized lidar odometry and mapping on variable terrain,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018, pp. 4758–4765.
- J. Ma, X. Chen, J. Xu, and G. Xiong, “Seqot: A spatial–temporal transformer network for place recognition using sequential lidar data,” IEEE Transactions on Industrial Electronics, vol. 70, no. 8, pp. 8225–8234, 2022.
- X. Chen, S. Li, B. Mersch, L. Wiesmann, J. Gall, J. Behley, and C. Stachniss, “Moving object segmentation in 3d lidar data: A learning-based approach exploiting sequential data,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 6529–6536, 2021.
- A. Milioto, I. Vizzo, J. Behley, and C. Stachniss, “Rangenet++: Fast and accurate lidar semantic segmentation,” in 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2019, pp. 4213–4220.
- W. Luo, B. Yang, and R. Urtasun, “Fast and furious: Real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 3569–3577.
- M. Omama, S. V. Sundar, S. Chinchali, A. K. Singh, and K. M. Krishna, “Drift reduced navigation with deep explainable features,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 6316–6323.
- D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural network for real-time object recognition,” in 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2015, pp. 922–928.
- G. Riegler, A. Osman Ulusoy, and A. Geiger, “Octnet: Learning deep 3d representations at high resolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3577–3586.
- P.-S. Wang, Y. Liu, Y.-X. Guo, C.-Y. Sun, and X. Tong, “O-cnn: Octree-based convolutional neural networks for 3d shape analysis,” ACM Transactions On Graphics (TOG), vol. 36, no. 4, pp. 1–11, 2017.
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.
- X. Yan, C. Zheng, Z. Li, S. Wang, and S. Cui, “Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5589–5598.
- H. Su, S. Maji, E. Kalogerakis, and E. G. Learned-Miller, “Multi-view convolutional neural networks for 3d shape recognition,” in Proc. ICCV, 2015.
- T. Yu, J. Meng, and J. Yuan, “Multi-view harmonized bilinear network for 3d object recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 186–194.
- Z. Yang and L. Wang, “Learning relationships for multi-view 3d object recognition,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 7505–7514.
- D. Deng and A. Zakhor, “Temporal lidar frame prediction for autonomous driving,” in 2020 International Conference on 3D Vision (3DV). IEEE, 2020, pp. 829–837.
- H. Fan and Y. Yang, “Pointrnn: Point recurrent neural network for moving point cloud processing,” arXiv preprint arXiv:1910.08287, 2019.
- F. Lu, G. Chen, Z. Li, L. Zhang, Y. Liu, S. Qu, and A. Knoll, “Monet: Motion-based point cloud prediction network,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 13 794–13 804, 2021.
- X. Weng, J. Wang, S. Levine, K. Kitani, and N. Rhinehart, “Inverting the pose forecasting pipeline with spf2: Sequential pointcloud forecasting for sequential pose forecasting,” in Conference on robot learning. PMLR, 2021, pp. 11–20.
- B. Mersch, X. Chen, J. Behley, and C. Stachniss, “Self-supervised point cloud prediction using 3d spatio-temporal convolutional networks,” in Conference on Robot Learning. PMLR, 2022, pp. 1444–1454.
- X. Weng, J. Nan, K.-H. Lee, R. McAllister, A. Gaidon, N. Rhinehart, and K. M. Kitani, “S2net: Stochastic sequential pointcloud forecasting,” in European Conference on Computer Vision. Springer, 2022, pp. 549–564.
- Z. Luo, J. Ma, Z. Zhou, and G. Xiong, “Pcpnet: An efficient and semantic-enhanced transformer network for point cloud prediction,” IEEE Robotics and Automation Letters, vol. 8, no. 7, pp. 4267–4274, 2023.
- X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” Advances in neural information processing systems, vol. 28, 2015.
- J. Schlemper, O. Oktay, L. Chen, J. Matthew, C. Knight, B. Kainz, B. Glocker, and D. Rueckert, “Attention-gated networks for improving ultrasound scan plane detection,” arXiv preprint arXiv:1804.05338, 2018.
- J. Park, S. Woo, J.-Y. Lee, and I. S. Kweon, “Bam: Bottleneck attention module,” arXiv preprint arXiv:1807.06514, 2018.
- J. Manttari, S. Broomé, J. Folkesson, and H. Kjellstrom, “Interpreting video features: A comparison of 3d convolutional networks and convolutional lstm networks,” in Proceedings of the Asian Conference on Computer Vision, 2020.
- M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13. Springer, 2014, pp. 818–833.
- J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson, “Understanding neural networks through deep visualization,” arXiv preprint arXiv:1506.06579, 2015.
- H. Fan, H. Su, and L. J. Guibas, “A point set generation network for 3d object reconstruction from a single image,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 605–613.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences,” in Proc. of the IEEE/CVF International Conf. on Computer Vision (ICCV), 2019.