Weakly Supervised Monocular 3D Detection with a Single-View Image (2402.19144v1)
Abstract: Monocular 3D detection (M3D) aims for precise 3D object localization from a single-view image which usually involves labor-intensive annotation of 3D detection boxes. Weakly supervised M3D has recently been studied to obviate the 3D annotation process by leveraging many existing 2D annotations, but it often requires extra training data such as LiDAR point clouds or multi-view images which greatly degrades its applicability and usability in various applications. We propose SKD-WM3D, a weakly supervised monocular 3D detection framework that exploits depth information to achieve M3D with a single-view image exclusively without any 3D annotations or other training data. One key design in SKD-WM3D is a self-knowledge distillation framework, which transforms image features into 3D-like representations by fusing depth information and effectively mitigates the inherent depth ambiguity in monocular scenarios with little computational overhead in inference. In addition, we design an uncertainty-aware distillation loss and a gradient-targeted transfer modulation strategy which facilitate knowledge acquisition and knowledge transfer, respectively. Extensive experiments show that SKD-WM3D surpasses the state-of-the-art clearly and is even on par with many fully supervised methods.
- M3d-rpn: Monocular 3d region proposal network for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9287–9296, 2019.
- Kinematic 3d object detection in monocular video. In Proceedings of the IEEE/CVF European Conference on Computer Vision, pages 135–152. Springer, 2020.
- nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11621–11631, 2020.
- Monorun: Monocular 3d object detection by reconstruction and uncertainty propagation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10379–10388, 2021.
- Monocular 3d object detection for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2147–2156, 2016.
- Monopair: Monocular 3d object detection using pairwise spatial relationships. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12093–12102, 2020.
- Monodistill: Learning spatial features for monocular 3d object detection. International Conference on Learning Representations, 2022.
- Feature-map-level online adversarial knowledge distillation. In International Conference on Machine Learning, pages 2006–2015. PMLR, 2020.
- Learning depth-guided convolutions for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1000–1001, 2020.
- Dqs3d: Densely-matched quantization-aware semi-supervised 3d detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
- Are we ready for autonomous driving? the kitti vision benchmark suite. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3354–3361. IEEE, 2012.
- Ross Girshick. Fast r-cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1440–1448, 2015.
- Mono3d++: Monocular 3d vehicle detection with two-scale 3d hypotheses and task priors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8409–8416, 2019.
- A comprehensive overhaul of feature distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1921–1930, 2019.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Penet: Towards precise and efficient image guided depth completion. In International Conference on Robotics and Automation, pages 13656–13662. IEEE, 2021.
- Knowledge distillation from a stronger teacher. Advances in Neural Information Processing Systems, 35:33716–33727, 2022.
- Refine myself by teaching myself: Feature refinement via self-knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10664–10673, 2021.
- What uncertainties do we need in bayesian deep learning for computer vision? Advances in Neural Information Processing Systems, 30, 2017.
- Monocular 3d object detection leveraging accurate proposals and shape reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11867–11876, 2019.
- Focal loss for dense object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2980–2988, 2017.
- Knowledge distillation via instance relationship graph. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7096–7104, 2019.
- Autoshape: Real-time shape-aware monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15641–15650, 2021.
- Geometry uncertainty projection network for monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3111–3121, 2021.
- Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6851–6860, 2019.
- Delving into localization errors for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4721–4730, 2021.
- Weakly supervised 3d object detection from lidar point cloud. In Proceedings of the IEEE/CVF European Conference on Computer Vision, pages 515–531. Springer, 2020.
- When does label smoothing help? Advances in Neural Information Processing Systems, 32, 2019.
- Reconstructing vehicles from a single image: Shape priors for road scene understanding. In International Conference on Robotics and Automation, pages 724–731. IEEE, 2017.
- Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3967–3976, 2019.
- Did-m3d: Decoupling instance depth for monocular 3d object detection. In Proceedings of the IEEE/CVF European Conference on Computer Vision, pages 71–88. Springer, 2022a.
- Weakm3d: Towards weakly supervised monocular 3d object detection. International Conference on Learning Representations, 2022b.
- Monogrnet: A geometric reasoning network for monocular 3d object localization. Proceedings of the AAAI Conference on Artificial Intelligence, 2019.
- Weakly supervised 3d object detection from point clouds. In Proceedings of the ACM International Conference on Multimedia, pages 4144–4152, 2020.
- Categorical depth distribution network for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8555–8564, 2021.
- Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.
- Disentangling monocular 3d object detection: From single to multi-class recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1219–1231, 2020.
- Deeply-supervised knowledge synergy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6997–7006, 2019.
- Rethinking the inception architecture for computer vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2818–2826, 2016.
- Weakly supervised monocular 3d object detection using multi-view projection and direction consistency. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- Contrastive representation distillation. arXiv preprint arXiv:1910.10699, 2019.
- Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017.
- Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8445–8453, 2019.
- Data-distortion guided self-distillation for deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 5565–5572, 2019.
- From knowledge distillation to self-knowledge distillation: A unified approach with normalized loss and customized soft labels. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
- Pseudo-lidar++: Accurate depth for 3d object detection in autonomous driving. International Conference on Learning Representations, 2020.
- Deep layer aggregation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2403–2412, 2018.
- Regularizing class-wise predictions via self-knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13876–13885, 2020.
- Autolabeling 3d objects with differentiable rendering of sdf shape priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12224–12233, 2020.
- Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3713–3722, 2019.
- Monodetr: Depth-aware transformer for monocular 3d object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
- Decoupled knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11953–11962, 2022.
- Objects as points. arXiv preprint arXiv:1904.07850, 2019.
- Monocular 3d object detection: An extrinsic parameter free approach. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
- Knowledge distillation by on-the-fly native ensemble. Advances in Neural Information Processing Systems, 31, 2018.
- Xueying Jiang (8 papers)
- Sheng Jin (69 papers)
- Lewei Lu (55 papers)
- Xiaoqin Zhang (39 papers)
- Shijian Lu (151 papers)