Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TULIP: Transformer for Upsampling of LiDAR Point Clouds (2312.06733v4)

Published 11 Dec 2023 in cs.CV

Abstract: LiDAR Upsampling is a challenging task for the perception systems of robots and autonomous vehicles, due to the sparse and irregular structure of large-scale scene contexts. Recent works propose to solve this problem by converting LiDAR data from 3D Euclidean space into an image super-resolution problem in 2D image space. Although their methods can generate high-resolution range images with fine-grained details, the resulting 3D point clouds often blur out details and predict invalid points. In this paper, we propose TULIP, a new method to reconstruct high-resolution LiDAR point clouds from low-resolution LiDAR input. We also follow a range image-based approach but specifically modify the patch and window geometries of a Swin-Transformer-based network to better fit the characteristics of range images. We conducted several experiments on three public real-world and simulated datasets. TULIP outperforms state-of-the-art methods in all relevant metrics and generates robust and more realistic point clouds than prior works.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Pu-dense: Sparse tensor-based point cloud geometry upsampling. IEEE Transactions on Image Processing, 31:4133–4148, 2022.
  2. Deep vit features as dense visual descriptors. ECCVW What is Motion For?, 2022.
  3. Unsupervised domain adaptation for lidar panoptic segmentation. IEEE Robotics and Automation Letters, 7(2):3404–3411, 2022.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  5. Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision, pages 205–218. Springer, 2022.
  6. End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
  7. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  8. Density-imbalance-eased lidar point cloud upsampling via feature consistency learning. IEEE Transactions on Intelligent Vehicles, 2022.
  9. Activating more pixels in image super-resolution transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22367–22377, 2023a.
  10. Learning continuous image representation with local implicit image function. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8628–8638, 2021.
  11. Dual aggregation transformer for image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12312–12321, 2023b.
  12. Depth estimation via affinity learned with convolutional spatial propagation network. In Proceedings of the European conference on computer vision (ECCV), pages 103–119, 2018.
  13. Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2015.
  14. Accelerating the super-resolution convolutional neural network. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 391–407. Springer, 2016.
  15. Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017.
  16. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  17. Hals: A height-aware lidar super-resolution framework for autonomous driving. arXiv preprint arXiv:2202.03901, 2022.
  18. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR, 2016.
  19. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012.
  20. Penet: Towards precise and efficient image guided depth completion. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13656–13662. IEEE, 2021.
  21. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, page 2, 2019.
  22. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1646–1654, 2016a.
  23. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1637–1645, 2016b.
  24. Implicit lidar network: Lidar super-resolution via interpolation weight prediction. In 2022 International Conference on Robotics and Automation (ICRA), pages 8424–8430. IEEE, 2022.
  25. Durlar: A high-fidelity 128-channel lidar dataset with panoramic ambient and reflectivity imagery for multi-modal autonomous driving applications. In Proc. Int. Conf. on 3D Vision. IEEE, 2021.
  26. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021.
  27. Dynamic spatial propagation network for depth completion. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1638–1646, 2022.
  28. Panoswin: A pano-style swin transformer for panorama understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17755–17764, 2023.
  29. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  30. Decoupled weight decay regularization. In International Conference on Learning Representations, 2017.
  31. Transformer for single image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 457–466, 2022.
  32. Non-local spatial propagation network for depth completion. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pages 120–136. Springer, 2020.
  33. Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3313–3322, 2019.
  34. Pu-transformer: Point cloud upsampling transformer. In Proceedings of the Asian Conference on Computer Vision, pages 2475–2493, 2022.
  35. Guideformer: Transformers for image guided depth completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6250–6259, 2022.
  36. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  37. Lidar upsampling with sliced wasserstein distance. IEEE Robotics and Automation Letters, 8(1):392–399, 2022.
  38. Simulation-based lidar super-resolution for ground vehicles. Robotics and Autonomous Systems, 134:103647, 2020.
  39. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874–1883, 2016.
  40. Opdn: Omnidirectional position-aware deformable network for omnidirectional image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1293–1301, 2023.
  41. Learning guided convolutional network for depth completion. IEEE Transactions on Image Processing, 30:1116–1129, 2020.
  42. Cnn-based synthesis of realistic high-resolution lidar data. In 2019 IEEE Intelligent Vehicles Symposium (IV), pages 1512–1519. IEEE, 2019.
  43. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  44. Swin-t-nfc crfs: An encoder–decoder neural model for high-precision uav positioning via point cloud super resolution and image semantic segmentation. Computer Communications, 197:52–60, 2023.
  45. Uformer: A general u-shaped transformer for image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17683–17693, 2022.
  46. Super-resolution neural operator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18247–18256, 2023.
  47. Rignet: Repetitive image guided network for depth completion. In European Conference on Computer Vision, 2021.
  48. Std: Sparse-to-dense 3d object detector for point cloud. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1951–1960, 2019.
  49. Meta-pu: An arbitrary-scale upsampling network for point cloud. IEEE transactions on visualization and computer graphics, 28(9):3206–3218, 2021.
  50. Osrt: Omnidirectional image super-resolution with distortion-aware transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13283–13292, 2023.
  51. Pu-net: Point cloud upsampling network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2790–2799, 2018.
  52. Panoramic vision transformer for saliency detection in 360 videos. In ECCV, 2022.
  53. Data-driven upsampling of point clouds. Computer-Aided Design, 112:1–13, 2019.
  54. Residual dense network for image restoration. TPAMI, 2020.
  55. Self-supervised arbitrary-scale point clouds upsampling via implicit neural representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1999–2007, 2022.
  56. Sspu-net: Self-supervised point cloud upsampling via differentiable rendering. In Proceedings of the 29th ACM International Conference on Multimedia, pages 2214–2223, 2021.
  57. St-depthnet: A spatio-temporal deep network for depth completion using a single non-repetitive circular scanning lidar. IEEE Robotics and Automation Letters, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.