Rotation Matters: Generalized Monocular 3D Object Detection for Various Camera Systems (2310.05366v1)
Abstract: Research on monocular 3D object detection is being actively studied, and as a result, performance has been steadily improving. However, 3D object detection performance is significantly reduced when applied to a camera system different from the system used to capture the training datasets. For example, a 3D detector trained on datasets from a passenger car mostly fails to regress accurate 3D bounding boxes for a camera mounted on a bus. In this paper, we conduct extensive experiments to analyze the factors that cause performance degradation. We find that changing the camera pose, especially camera orientation, relative to the road plane caused performance degradation. In addition, we propose a generalized 3D object detection method that can be universally applied to various camera systems. We newly design a compensation module that corrects the estimated 3D bounding box location and heading direction. The proposed module can be applied to most of the recent 3D object detection networks. It increases AP3D score (KITTI moderate, IoU $> 70\%$) about 6-to-10-times above the baselines without additional training. Both quantitative and qualitative results show the effectiveness of the proposed method.
- Height and uprightness invariance for 3d prediction from a single view. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 491–500, 2020.
- Deep manta: A coarse-to-fine many-task network for joint 2d and 3d vehicle analysis from monocular image. In CVPR, 2017.
- Monocular 3d object detection for autonomous driving. In CVPR, 2016.
- Rangedet: In defense of range view for lidar-based 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2918–2927, 2021.
- Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012.
- Homography loss for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1080–1089, 2022.
- Multiple view geometry in computer vision. Cambridge university press, 2003.
- Exploring intermediate representation for monocular vehicle pose estimation. In CVPR, 2021.
- Geometry-aware data augmentation for monocular 3d object detection. arXiv preprint arXiv:2104.05858, 2021.
- Smoke: Single-stage monocular 3d object detection via keypoint estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 996–997, 2020.
- Delving into localization errors for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4721–4730, 2021.
- Roi-10d: Monocular lifting of 2d detection to 6d pose and metric shape. In CVPR, 2019.
- Automotive radar dataset for deep learning based 3d object detection. In 2019 16th european radar conference (EuRAD), pages 129–132. IEEE, 2019.
- Deep learning based 3d object detection for automotive radar and camera. In 2019 16th European Radar Conference (EuRAD), pages 133–136. IEEE, 2019.
- Centerfusion: Center-based radar and camera fusion for 3d object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1527–1536, 2021.
- Clocs: Camera-lidar object candidates fusion for 3d object detection. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10386–10393. IEEE, 2020.
- Monogrnet: A geometric reasoning network for monocular 3d object localization. In AAAI, 2019.
- Ground plane polling for 6dof pose estimation of objects on the road. IEEE Transactions on Intelligent Vehicles, 5(3):449–460, 2020.
- Rsn: Range sparse net for efficient, accurate lidar 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5725–5734, 2021.
- Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14362–14372, 2021.
- Multi-level fusion based 3d object detection from monocular images. In CVPR, 2018.
- Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4471–4480, 2019.
- Objects are different: Flexible monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3289–3298, 2021.
- Camera pose matters: Improving depth prediction by mitigating pose distribution bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15759–15768, 2021.
- Monocular 3d object detection: An extrinsic parameter free approach. In CVPR, 2021.