Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation (2310.09739v3)

Published 15 Oct 2023 in cs.CV

Abstract: Unsupervised depth completion and estimation methods are trained by minimizing reconstruction error. Block artifacts from resampling, intensity saturation, and occlusions are amongst the many undesirable by-products of common data augmentation schemes that affect image reconstruction quality, and thus the training signal. Hence, typical augmentations on images viewed as essential to training pipelines in other vision tasks have seen limited use beyond small image intensity changes and flipping. The sparse depth modality in depth completion have seen even less use as intensity transformations alter the scale of the 3D scene, and geometric transformations may decimate the sparse points during resampling. We propose a method that unlocks a wide range of previously-infeasible geometric augmentations for unsupervised depth completion and estimation. This is achieved by reversing, or ``undo''-ing, geometric transformations to the coordinates of the output depth, warping the depth map back to the original reference frame. This enables computing the reconstruction losses using the original images and sparse depth maps, eliminating the pitfalls of naive loss computation on the augmented inputs and allowing us to scale up augmentations to boost performance. We demonstrate our method on indoor (VOID) and outdoor (KITTI) datasets, where we consistently improve upon recent methods across both datasets as well as generalization to four other datasets. Code available at: https://github.com/alexklwong/augundo.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Depth estimation via affinity learned with convolutional spatial propagation network. In Proceedings of the European Conference on Computer Vision (ECCV), pages 103–119, 2018.
  2. Cspn++: Learning context and resource aware convolutional spatial propagation networks for depth completion. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 10615–10622, 2020.
  3. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017.
  4. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE international conference on computer vision, pages 2650–2658, 2015.
  5. Depth map prediction from a single image using a multi-scale deep network. Advances in neural information processing systems, 27, 2014.
  6. Uncertainty-aware cnns for depth completion: Uncertainty from beginning to end. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12014–12023, 2020.
  7. Geo-supervised visual depth prediction. IEEE Robotics and Automation Letters, 4(2):1661–1668, 2019.
  8. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981.
  9. Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012.
  10. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
  11. Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 270–279, 2017.
  12. Digging into self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3828–3838, 2019.
  13. 3d packing for self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2485–2494, 2020.
  14. A combined corner and edge detector. In Alvey vision conference, pages 10–5244. Citeseer, 1988.
  15. Penet: Towards precise and efficient image guided depth completion. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13656–13662. IEEE, 2021.
  16. Sparse and dense data with cnns: Depth completion and semantic segmentation. In 2018 International Conference on 3D Vision (3DV), pages 52–60. IEEE, 2018.
  17. Struct-mdc: Mesh-refined unsupervised depth completion leveraging structural regularities from visual slam. IEEE Robotics and Automation Letters, 7(3):6391–6398, 2022.
  18. Costdcnet: Cost volume based depth completion for a single rgb-d image. In European Conference on Computer Vision, pages 257–274. Springer, 2022.
  19. Adam: A method for stochastic gradient descent. In ICLR: International Conference on Learning Representations, pages 1–15. ICLR US., 2015.
  20. Deepsmooth: Efficient and smooth depth completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3357–3366, 2023.
  21. Epnp: An accurate o (n) solution to the pnp problem. International journal of computer vision, 81(2):155, 2009.
  22. A multi-scale guided cascade hourglass network for depth completion. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 32–40, 2020.
  23. Dynamic spatial propagation network for depth completion. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1638–1646, 2022.
  24. Monitored distillation for positive congruent depth completion. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II, pages 35–53. Springer, 2022.
  25. Project to adapt: Domain adaptation for depth completion from noisy and sparse sensor data. In Proceedings of the Asian Conference on Computer Vision, 2020.
  26. Hr-depth: High resolution self-supervised monocular depth estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2294–2301, 2021.
  27. Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera. In International Conference on Robotics and Automation (ICRA), pages 3288–3295. IEEE, 2019.
  28. Non-local spatial propagation network for depth completion. In European Conference on Computer Vision, ECCV 2020. European Conference on Computer Vision, 2020.
  29. The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621, 2017.
  30. Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3313–3322, 2019.
  31. Depth completion via deep basis fitting. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 71–80, 2020.
  32. Bayesian deep basis fitting for depth completion with uncertainty. In Proceedings of the IEEE/CVF international conference on computer vision, pages 16147–16157, 2021.
  33. Make3d: Learning 3d scene structure from a single still image. IEEE transactions on pattern analysis and machine intelligence, 31(5):824–840, 2009.
  34. Dfusenet: Deep fusion of rgb and sparse depth information for image guided dense depth completion. In 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 13–20. IEEE, 2019.
  35. A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
  36. Indoor segmentation and support inference from rgbd images. In European conference on computer vision, pages 746–760. Springer, 2012.
  37. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2446–2454, 2020.
  38. Improving deep learning with generic data augmentation. In 2018 IEEE symposium series on computational intelligence (SSCI), pages 1542–1547. IEEE, 2018.
  39. Sparsity invariant cnns. In 2017 international conference on 3D Vision (3DV), pages 11–20. IEEE, 2017.
  40. Sparse and noisy lidar completion with rgb guidance and uncertainty. In 2019 16th International Conference on Machine Vision Applications (MVA), pages 1–6. IEEE, 2019.
  41. Unsupervised depth completion with calibrated backprojection layers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12747–12756, 2021.
  42. Unsupervised depth completion from visual inertial odometry. IEEE Robotics and Automation Letters, 5(2):1899–1906, 2020.
  43. Learning topology from synthetic data for unsupervised depth completion. IEEE Robotics and Automation Letters, 6(2):1495–1502, 2021a.
  44. An adaptive framework for learning unsupervised depth completion. IEEE Robotics and Automation Letters, 6(2):3120–3127, 2021b.
  45. Depth completion from sparse lidar data with depth-normal constraints. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2811–2820, 2019.
  46. Dense depth posterior (ddp) from single image and sparse range. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3353–3362, 2019.
  47. Aggregating feature point cloud for depth completion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8732–8743, 2023.
  48. Lite-mono: A lightweight cnn and transformer architecture for self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18537–18546, 2023a.
  49. Deep depth completion of a single rgb-d image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 175–185, 2018.
  50. Completionformer: Depth completion with convolutions and vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18527–18536, 2023b.
  51. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1851–1858, 2017.
Citations (8)

Summary

We haven't generated a summary for this paper yet.