TULIP: Transformer for Upsampling of LiDAR Point Clouds (2312.06733v4)
Abstract: LiDAR Upsampling is a challenging task for the perception systems of robots and autonomous vehicles, due to the sparse and irregular structure of large-scale scene contexts. Recent works propose to solve this problem by converting LiDAR data from 3D Euclidean space into an image super-resolution problem in 2D image space. Although their methods can generate high-resolution range images with fine-grained details, the resulting 3D point clouds often blur out details and predict invalid points. In this paper, we propose TULIP, a new method to reconstruct high-resolution LiDAR point clouds from low-resolution LiDAR input. We also follow a range image-based approach but specifically modify the patch and window geometries of a Swin-Transformer-based network to better fit the characteristics of range images. We conducted several experiments on three public real-world and simulated datasets. TULIP outperforms state-of-the-art methods in all relevant metrics and generates robust and more realistic point clouds than prior works.
- Pu-dense: Sparse tensor-based point cloud geometry upsampling. IEEE Transactions on Image Processing, 31:4133–4148, 2022.
- Deep vit features as dense visual descriptors. ECCVW What is Motion For?, 2022.
- Unsupervised domain adaptation for lidar panoptic segmentation. IEEE Robotics and Automation Letters, 7(2):3404–3411, 2022.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision, pages 205–218. Springer, 2022.
- End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
- Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
- Density-imbalance-eased lidar point cloud upsampling via feature consistency learning. IEEE Transactions on Intelligent Vehicles, 2022.
- Activating more pixels in image super-resolution transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22367–22377, 2023a.
- Learning continuous image representation with local implicit image function. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8628–8638, 2021.
- Dual aggregation transformer for image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12312–12321, 2023b.
- Depth estimation via affinity learned with convolutional spatial propagation network. In Proceedings of the European conference on computer vision (ECCV), pages 103–119, 2018.
- Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2015.
- Accelerating the super-resolution convolutional neural network. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 391–407. Springer, 2016.
- Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
- Hals: A height-aware lidar super-resolution framework for autonomous driving. arXiv preprint arXiv:2202.03901, 2022.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR, 2016.
- Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012.
- Penet: Towards precise and efficient image guided depth completion. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13656–13662. IEEE, 2021.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, page 2, 2019.
- Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1646–1654, 2016a.
- Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1637–1645, 2016b.
- Implicit lidar network: Lidar super-resolution via interpolation weight prediction. In 2022 International Conference on Robotics and Automation (ICRA), pages 8424–8430. IEEE, 2022.
- Durlar: A high-fidelity 128-channel lidar dataset with panoramic ambient and reflectivity imagery for multi-modal autonomous driving applications. In Proc. Int. Conf. on 3D Vision. IEEE, 2021.
- Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021.
- Dynamic spatial propagation network for depth completion. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1638–1646, 2022.
- Panoswin: A pano-style swin transformer for panorama understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17755–17764, 2023.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2017.
- Transformer for single image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 457–466, 2022.
- Non-local spatial propagation network for depth completion. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pages 120–136. Springer, 2020.
- Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3313–3322, 2019.
- Pu-transformer: Point cloud upsampling transformer. In Proceedings of the Asian Conference on Computer Vision, pages 2475–2493, 2022.
- Guideformer: Transformers for image guided depth completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6250–6259, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
- Lidar upsampling with sliced wasserstein distance. IEEE Robotics and Automation Letters, 8(1):392–399, 2022.
- Simulation-based lidar super-resolution for ground vehicles. Robotics and Autonomous Systems, 134:103647, 2020.
- Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874–1883, 2016.
- Opdn: Omnidirectional position-aware deformable network for omnidirectional image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1293–1301, 2023.
- Learning guided convolutional network for depth completion. IEEE Transactions on Image Processing, 30:1116–1129, 2020.
- Cnn-based synthesis of realistic high-resolution lidar data. In 2019 IEEE Intelligent Vehicles Symposium (IV), pages 1512–1519. IEEE, 2019.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Swin-t-nfc crfs: An encoder–decoder neural model for high-precision uav positioning via point cloud super resolution and image semantic segmentation. Computer Communications, 197:52–60, 2023.
- Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17683–17693, 2022.
- Super-resolution neural operator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18247–18256, 2023.
- Rignet: Repetitive image guided network for depth completion. In European Conference on Computer Vision, 2021.
- Std: Sparse-to-dense 3d object detector for point cloud. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1951–1960, 2019.
- Meta-pu: An arbitrary-scale upsampling network for point cloud. IEEE transactions on visualization and computer graphics, 28(9):3206–3218, 2021.
- Osrt: Omnidirectional image super-resolution with distortion-aware transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13283–13292, 2023.
- Pu-net: Point cloud upsampling network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2790–2799, 2018.
- Panoramic vision transformer for saliency detection in 360 videos. In ECCV, 2022.
- Data-driven upsampling of point clouds. Computer-Aided Design, 112:1–13, 2019.
- Residual dense network for image restoration. TPAMI, 2020.
- Self-supervised arbitrary-scale point clouds upsampling via implicit neural representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1999–2007, 2022.
- Sspu-net: Self-supervised point cloud upsampling via differentiable rendering. In Proceedings of the 29th ACM International Conference on Multimedia, pages 2214–2223, 2021.
- St-depthnet: A spatio-temporal deep network for depth completion using a single non-repetitive circular scanning lidar. IEEE Robotics and Automation Letters, 2023.