Stereo-LiDAR Depth Estimation with Deformable Propagation and Learned Disparity-Depth Conversion (2404.07545v1)
Abstract: Accurate and dense depth estimation with stereo cameras and LiDAR is an important task for automatic driving and robotic perception. While sparse hints from LiDAR points have improved cost aggregation in stereo matching, their effectiveness is limited by the low density and non-uniform distribution. To address this issue, we propose a novel stereo-LiDAR depth estimation network with Semi-Dense hint Guidance, named SDG-Depth. Our network includes a deformable propagation module for generating a semi-dense hint map and a confidence map by propagating sparse hints using a learned deformable window. These maps then guide cost aggregation in stereo matching. To reduce the triangulation error in depth recovery from disparity, especially in distant regions, we introduce a disparity-depth conversion module. Our method is both accurate and efficient. The experimental results on benchmark tests show its superior performance. Our code is available at https://github.com/SJTU-ViSYS/SDG-Depth.
- L. Nalpantidis and A. Gasteratos, “Stereo vision for robotic applications in the presence of non-ideal lighting conditions,” Image and Vision Computing, vol. 28, no. 6, pp. 940–951, 2010.
- A. Geiger, J. Ziegler, and C. Stiller, “Stereoscan: Dense 3d reconstruction in real-time,” in 2011 IEEE intelligent vehicles symposium (IV). Ieee, 2011, pp. 963–968.
- J. Zbontar, Y. LeCun, et al., “Stereo matching by training a convolutional neural network to compare image patches.” J. Mach. Learn. Res., vol. 17, no. 1, pp. 2287–2318, 2016.
- S. S. Shivakumar, K. Mohta, B. Pfrommer, V. Kumar, and C. J. Taylor, “Real time dense depth estimation by fusing stereo with sparse depth measurements,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 6482–6488.
- T. A. Siddiqui, R. Madhok, and M. O’Toole, “An extensible multi-sensor fusion framework for 3d imaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 1008–1009.
- J. Qiu, Z. Cui, Y. Zhang, X. Zhang, S. Liu, B. Zeng, and M. Pollefeys, “Deeplidar: Deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3313–3322.
- M. Poggi, D. Pallotti, F. Tosi, and S. Mattoccia, “Guided stereo matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 979–988.
- Y.-K. Huang, Y.-C. Liu, T.-H. Wu, H.-T. Su, Y.-C. Chang, T.-L. Tsou, Y.-A. Wang, and W. H. Hsu, “S3: Learnable sparse signal superdensity for guided depth estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16 706–16 716.
- Z. Xu, Y. Li, S. Zhu, and Y. Sun, “Expanding sparse lidar depth and guiding stereo matching for robust dense depth estimation,” IEEE Robotics and Automation Letters, vol. 8, no. 3, pp. 1479–1486, 2023.
- Y. Zhang, L. Wang, K. Li, Z. Fu, and Y. Guo, “Slfnet: a stereo and lidar fusion network for depth completion,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 605–10 612, 2022.
- T.-H. Wang, H.-N. Hu, C. H. Lin, Y.-H. Tsai, W.-C. Chiu, and M. Sun, “3d lidar and stereo fusion using stereo matching network with conditional cost volume normalization,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019, pp. 5895–5902.
- X. Gu, Z. Fan, S. Zhu, Z. Dai, F. Tan, and P. Tan, “Cascade cost volume for high-resolution multi-view stereo and stereo matching,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2495–2504.
- Z. Shen, Y. Dai, and Z. Rao, “Cfnet: Cascade and fused cost volume for robust stereo matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13 906–13 915.
- J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, “Sparsity invariant cnns,” in 2017 international conference on 3D Vision (3DV). IEEE, 2017, pp. 11–20.
- Y. Cabon, N. Murray, and M. Humenberger, “Virtual kitti 2,” arXiv preprint arXiv:2001.10773, 2020.
- U. Shin, J. Park, and I. S. Kweon, “Deep depth estimation from thermal image,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1043–1053.
- S. Zagoruyko and N. Komodakis, “Learning to compare image patches via convolutional neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 4353–4361.
- N. Mayer, E. Ilg, P. Hausser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox, “A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4040–4048.
- A. Kendall, H. Martirosyan, S. Dasgupta, P. Henry, R. Kennedy, A. Bachrach, and A. Bry, “End-to-end learning of geometry and context for deep stereo regression,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 66–75.
- J.-R. Chang and Y.-S. Chen, “Pyramid stereo matching network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5410–5418.
- H. Xu and J. Zhang, “Aanet: Adaptive aggregation network for efficient stereo matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1959–1968.
- V. Tankovich, C. Hane, Y. Zhang, A. Kowdle, S. Fanello, and S. Bouaziz, “Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 362–14 372.
- L. Lipson, Z. Teed, and J. Deng, “Raft-stereo: Multilevel recurrent field transforms for stereo matching,” in 2021 International Conference on 3D Vision (3DV). IEEE, 2021, pp. 218–227.
- H. Xu, J. Zhang, J. Cai, H. Rezatofighi, F. Yu, D. Tao, and A. Geiger, “Unifying flow, stereo and depth estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- X. Cheng, P. Wang, and R. Yang, “Depth estimation via affinity learned with convolutional spatial propagation network,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 103–119.
- S. Liu, S. De Mello, J. Gu, G. Zhong, M.-H. Yang, and J. Kautz, “Learning affinity via spatial propagation networks,” Advances in Neural Information Processing Systems, vol. 30, 2017.
- J. Park, K. Joo, Z. Hu, C.-K. Liu, and I. So Kweon, “Non-local spatial propagation network for depth completion,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16. Springer, 2020, pp. 120–136.
- J. Tang, F.-P. Tian, W. Feng, J. Li, and P. Tan, “Learning guided convolutional network for depth completion,” IEEE Transactions on Image Processing, vol. 30, pp. 1116–1129, 2020.
- M. Hu, S. Wang, B. Li, S. Ning, L. Fan, and X. Gong, “Penet: Towards precise and efficient image guided depth completion,” in 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021, pp. 13 656–13 662.
- L. Liu, X. Song, J. Sun, X. Lyu, L. Li, Y. Liu, and L. Zhang, “Mff-net: Towards efficient monocular depth completion with multi-modal feature fusion,” IEEE Robotics and Automation Letters, 2023.
- L. Teixeira, M. R. Oswald, M. Pollefeys, and M. Chli, “Aerial single-view depth completion with image-guided uncertainty estimation,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1055–1062, 2020.
- S. S. Shivakumar, T. Nguyen, I. D. Miller, S. W. Chen, V. Kumar, and C. J. Taylor, “Dfusenet: Deep fusion of rgb and sparse depth information for image guided dense depth completion,” in 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, 2019, pp. 13–20.
- X. Cheng, Y. Zhong, Y. Dai, P. Ji, and H. Li, “Noise-aware unsupervised deep lidar-stereo fusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 6339–6348.
- Y. You, Y. Wang, W.-L. Chao, D. Garg, G. Pleiss, B. Hariharan, M. Campbell, and K. Q. Weinberger, “Pseudo-lidar++: Accurate depth for 3d object detection in autonomous driving,” arXiv preprint arXiv:1906.06310, 2019.
- N.-A.-M. Mai, P. Duthon, L. Khoudour, A. Crouzil, and S. Velastin, “Sparse lidar and stereo fusion (sls-fusion) for depth estimation and 3d object detection,” 2021.
- K. Park, S. Kim, and K. Sohn, “High-precision depth estimation with the 3d lidar and stereo fusion,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 2156–2163.
- J. Zhang, M. S. Ramanagopal, R. Vasudevan, and M. Johnson-Roberson, “Listereo: Generate dense depth maps from lidar and stereo imagery,” in 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 7829–7836.
- J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 764–773.
- X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9308–9316.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- H. Zhao, H. Zhou, Y. Zhang, Y. Zhao, Y. Yang, and T. Ouyang, “Eai-stereo: Error aware iterative network for stereo matching,” in Proceedings of the Asian Conference on Computer Vision, 2022, pp. 315–332.
- J. Choe, K. Joo, T. Imtiaz, and I. S. Kweon, “Volumetric propagation network: Stereo-lidar fusion for long-range depth estimation,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4672–4679, 2021.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
- Ang Li (472 papers)
- Anning Hu (3 papers)
- Wei Xi (21 papers)
- Wenxian Yu (36 papers)
- Danping Zou (23 papers)