RadarCam-Depth: Radar-Camera Fusion for Depth Estimation with Learned Metric Scale (2401.04325v2)
Abstract: We present a novel approach for metric dense depth estimation based on the fusion of a single-view image and a sparse, noisy Radar point cloud. The direct fusion of heterogeneous Radar and image data, or their encodings, tends to yield dense depth maps with significant artifacts, blurred boundaries, and suboptimal accuracy. To circumvent this issue, we learn to augment versatile and robust monocular depth prediction with the dense metric scale induced from sparse and noisy Radar data. We propose a Radar-Camera framework for highly accurate and fine-detailed dense depth estimation with four stages, including monocular depth prediction, global scale alignment of monocular depth with sparse Radar points, quasi-dense scale estimation through learning the association between Radar points and image patches, and local scale refinement of dense depth using a scale map learner. Our proposed method significantly outperforms the state-of-the-art Radar-Camera depth estimation methods by reducing the mean absolute error (MAE) of depth estimation by 25.6% and 40.2% on the challenging nuScenes dataset and our self-collected ZJU-4DRadarCam dataset, respectively. Our code and dataset will be released at \url{https://github.com/MMOCKING/RadarCam-Depth}.
- “Deepv2d: Video to depth with differentiable structure from motion” In arXiv preprint arXiv:1812.04605, 2018
- “Unsupervised depth completion from visual inertial odometry” In IEEE Robotics and Automation Letters 5.2 IEEE, 2020, pp. 1899–1906
- Reiner Birkl, Diana Wofk and Matthias Müller “MiDaS v3.1 – A Model Zoo for Robust Monocular Relative Depth Estimation” In arXiv preprint arXiv:2307.14460, 2023
- “Towards Zero-Shot Scale-Aware Monocular Depth Estimation” In arXiv preprint arXiv:2306.17253, 2023
- “Monocular Visual-Inertial Depth Estimation” In arXiv preprint arXiv:2303.12134, 2023
- “RoLM: Radar on LiDAR Map Localization” In 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 3976–3982 IEEE
- “FMCW Radar on LiDAR Map Localization in Structual Urban Environments”, 2023
- Juan-Ting Lin, Dengxin Dai and Luc Van Gool “Depth estimation from monocular images and sparse radar data” In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 10233–10240 IEEE
- “Full-velocity radar returns by radar-camera fusion” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 16198–16207
- “Radar-camera pixel depth association for depth completion” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12507–12516
- “Depth estimation from monocular images and sparse radar using deep ordinal regression network” In 2021 IEEE International Conference on Image Processing (ICIP), 2021, pp. 3343–3347 IEEE
- “Depth Estimation From Camera Image and mmWave Radar Point Cloud” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9275–9285
- “R4Dyn: Exploring radar for self-supervised monocular depth estimation of dynamic scenes” In 2021 International Conference on 3D Vision (3DV), 2021, pp. 751–760 IEEE
- “Towards better generalization: Joint depth-pose learning without posenet” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9151–9161
- “On the uncertainty of self-supervised monocular depth estimation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3227–3237
- “Improving semi-supervised and domain-adaptive semantic segmentation with self-supervised depth estimation” In International Journal of Computer Vision Springer, 2023, pp. 1–27
- “Unsupervised scale-consistent depth and ego-motion learning from monocular video” In Advances in neural information processing systems 32, 2019
- “Unsupervised Monocular Estimation of Depth and Visual Odometry uUsing Attention and Depth-Pose Consistency Loss” In IEEE Transactions on Multimedia IEEE, 2023
- “Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer” In IEEE transactions on pattern analysis and machine intelligence 44.3 IEEE, 2020, pp. 1623–1637
- René Ranftl, Alexey Bochkovskiy and Vladlen Koltun “Vision transformers for dense prediction” In Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 12179–12188
- “Deep depth estimation from visual-inertial slam” In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 10038–10045 IEEE
- “CodeVIO: Visual-inertial odometry with learned optimizable dense depth” In 2021 ieee international conference on robotics and automation (icra), 2021, pp. 14382–14388 IEEE
- “Video depth estimation by fusing flow-to-depth proposals” In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 10100–10107 IEEE
- “Deep ordinal regression network for monocular depth estimation” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2002–2011
- Yanchao Yang, Alex Wong and Stefano Soatto “Dense depth posterior (ddp) from single image and sparse range” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3353–3362
- “Sparse and noisy lidar completion with rgb guidance and uncertainty” In 2019 16th international conference on machine vision applications (MVA), 2019, pp. 1–6 IEEE
- “Unsupervised depth completion with calibrated backprojection layers” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 12747–12756
- “An image is worth 16x16 words: Transformers for image recognition at scale” In arXiv preprint arXiv:2010.11929, 2020
- George E Forsythe “Computer methods for mathematical computations” Prentice-hall, 1977
- Richard P Brent “Algorithms for minimization without derivatives” Courier Corporation, 2013
- “nuscenes: A multimodal dataset for autonomous driving” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11621–11631
- “LoFTR: Detector-free local feature matching with transformers” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8922–8931
- “Deep residual learning for image recognition” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778
- C Bradford Barber, David P Dobkin and Hannu Huhdanpaa “The quickhull algorithm for convex hulls” In ACM Transactions on Mathematical Software (TOMS) 22.4 Acm New York, NY, USA, 1996, pp. 469–483
- “Imagenet: A large-scale hierarchical image database” In 2009 IEEE conference on computer vision and pattern recognition, 2009, pp. 248–255 Ieee
- “NeuralRecon: Real-time coherent 3D reconstruction from monocular video” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15598–15607
- “Sparse-to-dense: Depth prediction from sparse depth samples and a single image” In 2018 IEEE international conference on robotics and automation (ICRA), 2018, pp. 4796–4803 IEEE
- “Plug-and-play: Improve depth estimation via sparse data propagation” In arXiv preprint arXiv:1812.08350, 2018