Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions (2403.12202v1)

Published 18 Mar 2024 in cs.CV

Abstract: In this paper, we introduce a novel approach that harnesses both 2D and 3D attentions to enable highly accurate depth completion without requiring iterative spatial propagations. Specifically, we first enhance a baseline convolutional depth completion model by applying attention to 2D features in the bottleneck and skip connections. This effectively improves the performance of this simple network and sets it on par with the latest, complex transformer-based models. Leveraging the initial depths and features from this network, we uplift the 2D features to form a 3D point cloud and construct a 3D point transformer to process it, allowing the model to explicitly learn and exploit 3D geometric features. In addition, we propose normalization techniques to process the point cloud, which improves learning and leads to better accuracy than directly using point transformers off the shelf. Furthermore, we incorporate global attention on downsampled point cloud features, which enables long-range context while still being computationally feasible. We evaluate our method, DeCoTR, on established depth completion benchmarks, including NYU Depth V2 and KITTI, showcasing that it sets new state-of-the-art performance. We further conduct zero-shot evaluations on ScanNet and DDAD benchmarks and demonstrate that DeCoTR has superior generalizability compared to existing approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Depthnet: Real-time lidar point cloud depth completion for autonomous vehicles. IEEE Access, 8:227825–227833, 2020.
  2. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE international conference on computer vision, pages 2722–2730, 2015.
  3. Learning joint 2d-3d representations for depth completion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10023–10032, 2019.
  4. Depth estimation via affinity learned with convolutional spatial propagation network. In Proceedings of the European conference on computer vision (ECCV), pages 103–119, 2018.
  5. Learning depth with convolutional spatial propagation network. IEEE transactions on pattern analysis and machine intelligence, 42(10):2361–2379, 2019.
  6. François Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017.
  7. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017.
  8. Learning depth vision-based personalized robot navigation from dynamic demonstrations in virtual reality. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 6757–6764. IEEE, 2023.
  9. Designing for depth perceptions in augmented reality. In 2017 IEEE international symposium on mixed and augmented reality (ISMAR), pages 111–122. IEEE, 2017.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
  11. Depthlab: Real-time 3d interaction with depth maps for mobile augmented reality. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, pages 829–843, 2020.
  12. Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems, 27, 2014.
  13. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012.
  14. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
  15. 3d packing for self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2485–2494, 2020.
  16. Dense disparity maps from sparse disparity measurements. In 2011 International Conference on Computer Vision, pages 2126–2133. IEEE, 2011.
  17. Penet: Towards precise and efficient image guided depth completion. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13656–13662. IEEE, 2021.
  18. Boosting monocular depth estimation with lightweight 3d point fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12767–12776, 2021.
  19. Depth completion with twin surface extrapolation at occlusion boundaries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2583–2592, 2021.
  20. Costdcnet: Cost volume based depth completion for a single rgb-d image. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II, page 257–274, Berlin, Heidelberg, 2022. Springer-Verlag.
  21. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  22. In defense of classical image processing: Fast depth completion on the cpu. In 2018 15th Conference on Computer and Robot Vision (CRV), pages 16–22. IEEE, 2018.
  23. Depth completion using plane-residual representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13916–13925, 2021.
  24. Dynamic spatial propagation network for depth completion. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1638–1646, 2022.
  25. Learning affinity via spatial propagation networks. Advances in Neural Information Processing Systems, 30, 2017.
  26. Graphcspn: Geometry-aware depth completion via dynamic gcns. In European Conference on Computer Vision, pages 90–107. Springer, 2022.
  27. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In 2018 IEEE international conference on robotics and automation (ICRA), pages 4796–4803. IEEE, 2018.
  28. Semattnet: Toward attention-based semantic aware guided depth completion. IEEE Access, 10:120781–120791, 2022.
  29. Non-local spatial propagation network for depth completion. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pages 120–136. Springer, 2020.
  30. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  31. Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3313–3322, 2019.
  32. Guideformer: Transformers for image guided depth completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6250–6259, 2022.
  33. Pairwise attention encoding for point cloud feature learning. In 2019 International Conference on 3D Vision (3DV), pages 135–144. IEEE, 2019a.
  34. Self-supervised learning of depth and ego-motion with differentiable bundle adjustment. arXiv preprint arXiv:1909.13163, 2019b.
  35. Ega-depth: Efficient guided attention for self-supervised multi-camera depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 119–129, 2023.
  36. Indoor segmentation and support inference from rgbd images. In Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12, pages 746–760. Springer, 2012.
  37. Learning guided convolutional network for depth completion. IEEE Transactions on Image Processing, 30:1116–1129, 2020.
  38. Sparsity invariant cnns. In 2017 international conference on 3D Vision (3DV), pages 11–20. IEEE, 2017.
  39. Lrru: Long-short range recurrent updating networks for depth completion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9422–9432, 2023.
  40. Point transformer v2: Grouped vector attention and partition-based pooling. Advances in Neural Information Processing Systems, 35:33330–33342, 2022.
  41. Depth completion from sparse lidar data with depth-normal constraints, 2019.
  42. Rignet: Repetitive image guided network for depth completion. In European Conference on Computer Vision, pages 214–230, 2022.
  43. Mamo: Leveraging memory and attention for monocular video depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8754–8764, 2023.
  44. Aggregating feature point cloud for depth completion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8732–8743, 2023.
  45. Completionformer: Depth completion with convolutions and vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18527–18536, 2023.
  46. Point transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 16259–16268, 2021a.
  47. Adaptive context-aware multi-modal network for depth completion. IEEE Transactions on Image Processing, 30:5264–5276, 2021b.
  48. Adaptive context-aware multi-modal network for depth completion. IEEE Transactions on Image Processing, 30:5264–5276, 2021c.
  49. Bev@ dc: Bird’s-eye view assisted training for depth completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9233–9242, 2023.
  50. Structure-attentioned memory network for monocular depth estimation. arXiv preprint arXiv:1909.04594, 2019.
  51. Mda-net: memorable domain adaptation network for monocular depth estimation. In British Machine Vision Conference 2020, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yunxiao Shi (20 papers)
  2. Manish Kumar Singh (20 papers)
  3. Hong Cai (51 papers)
  4. Fatih Porikli (141 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com