MonoCD: Monocular 3D Object Detection with Complementary Depths (2404.03181v1)
Abstract: Monocular 3D object detection has attracted widespread attention due to its potential to accurately obtain object 3D localization from a single image at a low cost. Depth estimation is an essential but challenging subtask of monocular 3D object detection due to the ill-posedness of 2D to 3D mapping. Many methods explore multiple local depth clues such as object heights and keypoints and then formulate the object depth estimation as an ensemble of multiple depth predictions to mitigate the insufficiency of single-depth information. However, the errors of existing multiple depths tend to have the same sign, which hinders them from neutralizing each other and limits the overall accuracy of combined depth. To alleviate this problem, we propose to increase the complementarity of depths with two novel designs. First, we add a new depth prediction branch named complementary depth that utilizes global and efficient depth clues from the entire image rather than the local clues to reduce the correlation of depth predictions. Second, we propose to fully exploit the geometric relations between multiple depth clues to achieve complementarity in form. Benefiting from these designs, our method achieves higher complementarity. Experiments on the KITTI benchmark demonstrate that our method achieves state-of-the-art performance without introducing extra data. In addition, complementary depth can also be a lightweight and plug-and-play module to boost multiple existing monocular 3d object detectors. Code is available at https://github.com/elvintanhust/MonoCD.
- M3d-rpn: Monocular 3d region proposal network for object detection. In ICCV, pages 9287–9296, 2019.
- Kinematic 3d object detection in monocular video. In ECCV, pages 135–152. Springer, 2020.
- nuscenes: A multimodal dataset for autonomous driving. In CVPR, pages 11621–11631, 2020.
- End-to-end object detection with transformers. In ECCV, pages 213–229. Springer, 2020.
- Monorun: Monocular 3d object detection by reconstruction and uncertainty propagation. In CVPR, pages 10379–10388, 2021.
- 3d object proposals for accurate object class detection. NeurIPS, 28, 2015.
- Monopair: Monocular 3d object detection using pairwise spatial relationships. In CVPR, pages 12093–12102, 2020.
- How do neural networks see depth in single images? In ICCV, pages 2183–2191, 2019.
- Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, pages 3354–3361. IEEE, 2012.
- Monodtr: Monocular 3d object detection with depth-aware transformer. In CVPR, pages 4012–4021, 2022.
- What uncertainties do we need in bayesian deep learning for computer vision? NeurIPS, 30, 2017.
- Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In CVPR, pages 7482–7491, 2018.
- Deviant: Depth equivariant network for monocular 3d object detection. In ECCV, pages 664–683. Springer, 2022.
- Pointpillars: Fast encoders for object detection from point clouds. In CVPR, pages 12697–12705, 2019.
- Stereo r-cnn based 3d object detection for autonomous driving. In CVPR, pages 7644–7652, 2019.
- Rts3d: Real-time stereo 3d detection from 4d feature-consistency embedding space for autonomous driving. In AAAI, pages 1930–1939, 2021.
- Densely constrained depth estimator for monocular 3d object detection. In ECCV, pages 718–734. Springer, 2022a.
- Diversity matters: Fully exploiting depth clues for reliable monocular 3d object detection. In CVPR, pages 2791–2800, 2022b.
- Monojsg: Joint semantic and geometric cost volume for monocular 3d object detection. In CVPR, pages 1070–1079, 2022.
- Learning auxiliary monocular contexts helps monocular 3d object detection. In AAAI, pages 1810–1818, 2022.
- Smoke: Single-stage monocular 3d object detection via keypoint estimation. In CVPRW, pages 996–997, 2020.
- Autoshape: Real-time shape-aware monocular 3d object detection. In ICCV, pages 15641–15650, 2021.
- Geometry uncertainty projection network for monocular 3d object detection. In ICCV, pages 3111–3121, 2021.
- Delving into localization errors for monocular 3d object detection. In CVPR, pages 4721–4730, 2021.
- Did-m3d: Decoupling instance depth for monocular 3d object detection. In ECCV, pages 71–88. Springer, 2022a.
- Side: center-based stereo 3d detector with structure-aware instance depth estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 119–128, 2022b.
- 3d object detection for autonomous driving: A survey. Pattern Recognition, 130:108796, 2022.
- Monoground: Detecting monocular 3d objects from the ground. In CVPR, pages 3793–3802, 2022.
- Categorical depth distribution network for monocular 3d object detection. In CVPR, pages 8555–8564, 2021.
- Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In CVPR, pages 10529–10538, 2020.
- Pv-rcnn++: Point-voxel feature set abstraction with local vector representation for 3d object detection. International Journal of Computer Vision, 131(2):531–551, 2023.
- Geometry-based distance decomposition for monocular 3d object detection. In ICCV, pages 15172–15181, 2021.
- Stereo centernet-based 3d object detection for autonomous driving. Neurocomputing, 471:219–229, 2022.
- Disentangling monocular 3d object detection. In ICCV, pages 1991–1999, 2019.
- Attention is all you need. NeurIPS, 30, 2017.
- Depth-conditioned dynamic message propagation for monocular 3d object detection. In CVPR, pages 454–463, 2021.
- Behind the curtain: Learning occluded shapes for 3d object detection. In AAAI, pages 2893–2901, 2022.
- Ground plane matters: Picking up ground plane prior in monocular 3d object detection. arXiv preprint arXiv:2211.01556, 2022.
- Center-based 3d object detection and tracking. In CVPR, pages 11784–11793, 2021.
- Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
- Deep layer aggregation. In CVPR, pages 2403–2412, 2018.
- Monodetr: Depth-guided transformer for monocular 3d object detection. In ICCV, pages 9155–9166, 2023.
- Objects are different: Flexible monocular 3d object detection. In CVPR, pages 3289–3298, 2021.
- Dimension embeddings for monocular 3d object detection. In CVPR, pages 1589–1598, 2022.
- Objects as points. arXiv preprint arXiv:1904.07850, 2019.
- Monoef: Extrinsic parameter free monocular 3d object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):10114–10128, 2021.
- Monoatt: Online monocular 3d object detection with adaptive token transformer. In CVPR, pages 17493–17503, 2023.
- Monoedge: Monocular 3d object detection using local perspectives. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 643–652, 2023.
- Longfei Yan (8 papers)
- Pei Yan (6 papers)
- Shengzhou Xiong (1 paper)
- Xuanyu Xiang (1 paper)
- Yihua Tan (10 papers)