Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OriCon3D: Effective 3D Object Detection using Orientation and Confidence (2304.14484v3)

Published 27 Apr 2023 in cs.CV

Abstract: In this paper, we propose an advanced methodology for the detection of 3D objects and precise estimation of their spatial positions from a single image. Unlike conventional frameworks that rely solely on center-point and dimension predictions, our research leverages a deep convolutional neural network-based 3D object weighted orientation regression paradigm. These estimates are then seamlessly integrated with geometric constraints obtained from a 2D bounding box, resulting in derivation of a comprehensive 3D bounding box. Our novel network design encompasses two key outputs. The first output involves the estimation of 3D object orientation through the utilization of a discrete-continuous loss function. Simultaneously, the second output predicts objectivity-based confidence scores with minimal variance. Additionally, we also introduce enhancements to our methodology through the incorporation of lightweight residual feature extractors. By combining the derived estimates with the geometric constraints inherent in the 2D bounding box, our approach significantly improves the accuracy of 3D object pose determination, surpassing baseline methodologies. Our method is rigorously evaluated on the KITTI 3D object detection benchmark, demonstrating superior performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Y. Xiang, W. Choi, Y. Lin, and S. Savarese, “Data-driven 3d voxel patterns for object category recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1903–1911.
  2. X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, “Monocular 3d object detection for autonomous driving,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2147–2156.
  3. J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
  4. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE conference on computer vision and pattern recognition.   IEEE, 2012, pp. 3354–3361.
  5. A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, “3d bounding box estimation using deep learning and geometry,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 7074–7082.
  6. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
  7. M. Tan and Q. Le, “Efficientnetv2: Smaller models and faster training,” in International conference on machine learning.   PMLR, 2021, pp. 10 096–10 106.
  8. X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S. Fidler, and R. Urtasun, “3d object proposals for accurate object class detection,” Advances in neural information processing systems, vol. 28, 2015.
  9. S. Shi, X. Wang, and H. Li, “Pointrcnn: 3d object proposal generation and detection from point cloud,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 770–779.
  10. Y. Tang, S. Dorn, and C. Savani, “Center3d: Center-based monocular 3d object detection with joint depth understanding,” in DAGM German Conference on Pattern Recognition.   Springer, 2020, pp. 289–302.
  11. V. Lepetit, F. Moreno-Noguer, and P. Fua, “Ep n p: An accurate o (n) solution to the p n p problem,” International journal of computer vision, vol. 81, pp. 155–166, 2009.
  12. V. Ferrari, T. Tuytelaars, and L. Van Gool, “Simultaneous object recognition and segmentation from single or multiple model views,” International journal of computer vision, vol. 67, no. 2, pp. 159–188, 2006.
  13. F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce, “3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints,” International journal of computer vision, vol. 66, no. 3, pp. 231–259, 2006.
  14. M. Rath and A. P. Condurache, “Boosting deep neural networks with geometrical prior knowledge: A survey,” arXiv preprint arXiv:2006.16867, 2020.
  15. X. Liu, N. Xue, and T. Wu, “Learning auxiliary monocular contexts helps monocular 3d object detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 1810–1818.
  16. G. Brazil and X. Liu, “M3d-rpn: Monocular 3d region proposal network for object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9287–9296.
  17. Y. Chen, L. Tai, K. Sun, and M. Li, “Monopair: Monocular 3d object detection using pairwise spatial relationships,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12 093–12 102.
  18. Y. Lu, X. Ma, L. Yang, T. Zhang, Y. Liu, Q. Chu, J. Yan, and W. Ouyang, “Geometry uncertainty projection network for monocular 3d object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3111–3121.
  19. M. Ding, Y. Huo, H. Yi, Z. Wang, J. Shi, Z. Lu, and P. Luo, “Learning depth-guided convolutions for monocular 3d object detection,” in Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition workshops, 2020, pp. 1000–1001.
  20. G. Brazil, G. Pons-Moll, X. Liu, and B. Schiele, “Kinematic 3d object detection in monocular video,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16.   Springer, 2020, pp. 135–152.
  21. A. Kumar, T. K. Marks, W. Mou, Y. Wang, M. Jones, A. Cherian, T. Koike-Akino, X. Liu, and C. Feng, “Luvli face alignment: Estimating landmarks’ location, uncertainty, and visibility likelihood,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8236–8246.
  22. Y. Zhang, J. Lu, and J. Zhou, “Objects are different: Flexible monocular 3d object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3289–3298.
  23. A. Kumar, G. Brazil, and X. Liu, “Groomed-nms: Grouped mathematically differentiable nms for monocular 3d object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8973–8983.
  24. C. Reading, A. Harakeh, J. Chae, and S. L. Waslander, “Categorical depth distribution network for monocular 3d object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8555–8564.
  25. D. Park, R. Ambrus, V. Guizilini, J. Li, and A. Gaidon, “Is pseudo-lidar needed for monocular 3d object detection?” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3142–3152.
  26. A. Simonelli, S. R. Bulo, L. Porzi, P. Kontschieder, and E. Ricci, “Are we missing confidence in pseudo-lidar methods for monocular 3d object detection?” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3225–3233.
  27. K. Matzen and N. Snavely, “Nyc3dcars: A dataset of 3d vehicles in geographic context,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 761–768.
  28. Y. Xiang, W. Kim, W. Chen, J. Ji, C. Choy, H. Su, R. Mottaghi, L. Guibas, and S. Savarese, “Objectnet3d: A large scale database for 3d object recognition,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14.   Springer, 2016, pp. 160–176.
  29. Y. Xiang, R. Mottaghi, and S. Savarese, “Beyond pascal: A benchmark for 3d object detection in the wild,” in IEEE winter conference on applications of computer vision.   IEEE, 2014, pp. 75–82.
  30. A. Kar, S. Tulsiani, J. Carreira, and J. Malik, “Category-specific object reconstruction from a single image,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1966–1974.
  31. B. Pepik, M. Stark, P. Gehler, T. Ritschel, and B. Schiele, “3d object class detection in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 1–10.
  32. B. Pepik, M. Stark, P. Gehler, and B. Schiele, “Teaching 3d geometry to deformable part models,” in 2012 IEEE conference on computer vision and pattern recognition.   IEEE, 2012, pp. 3362–3369.
  33. R. Mottaghi, Y. Xiang, and S. Savarese, “A coarse-to-fine model for 3d pose estimation and sub-category recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 418–426.
  34. M. Zhu, K. G. Derpanis, Y. Yang, S. Brahmbhatt, M. Zhang, C. Phillips, M. Lecce, and K. Daniilidis, “Single image 3d object detection and pose estimation for grasping,” in 2014 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2014, pp. 3936–3943.
  35. A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “The kitti vision benchmark suite,” URL http://www. cvlibs. net/datasets/kitti, vol. 2, no. 5, 2015.
  36. ——, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013.
  37. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  38. J. Fang, L. Zhou, and G. Liu, “3d bounding box estimation for autonomous vehicles by cascaded geometric constraints and depurated 2d detections using 3d results,” arXiv preprint arXiv:1909.01867, 2019.
  39. L. Wang, L. Du, X. Ye, Y. Fu, G. Guo, X. Xue, J. Feng, and L. Zhang, “Depth-conditioned dynamic message propagation for monocular 3d object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 454–463.
  40. L. Wang, L. Zhang, Y. Zhu, Z. Zhang, T. He, M. Li, and X. Xue, “Progressive coordinate transforms for monocular 3d object detection,” Advances in Neural Information Processing Systems, vol. 34, pp. 13 364–13 377, 2021.
  41. Z. Chong, X. Ma, H. Zhang, Y. Yue, H. Li, Z. Wang, and W. Ouyang, “Monodistill: Learning spatial features for monocular 3d object detection,” arXiv preprint arXiv:2201.10830, 2022.
  42. X. Shi, Q. Ye, X. Chen, C. Chen, Z. Chen, and T.-K. Kim, “Geometry-based distance decomposition for monocular 3d object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 172–15 181.
  43. X. Ma, Y. Zhang, D. Xu, D. Zhou, S. Yi, H. Li, and W. Ouyang, “Delving into localization errors for monocular 3d object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4721–4730.
  44. Y. Liu, Y. Yixuan, and M. Liu, “Ground-aware monocular 3d object detection for autonomous driving,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 919–926, 2021.
  45. A. Kumar, G. Brazil, E. Corona, A. Parchami, and X. Liu, “Deviant: Depth equivariant network for monocular 3d object detection,” in European Conference on Computer Vision.   Springer, 2022, pp. 664–683.
  46. Z. Zou, X. Ye, L. Du, X. Cheng, X. Tan, L. Zhang, J. Feng, X. Xue, and E. Ding, “The devil is in the task: Exploiting reciprocal appearance-localization features for monocular 3d object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2713–2722.
  47. A. Simonelli, S. R. Bulo, L. Porzi, M. L. Antequera, and P. Kontschieder, “Disentangling monocular 3d object detection: From single to multi-class recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1219–1231, 2020.
  48. Y. Hong, H. Dai, and Y. Ding, “Cross-modality knowledge distillation network for monocular 3d object detection,” in European Conference on Computer Vision.   Springer, 2022, pp. 87–104.
  49. Q. Ye, L. Jiang, W. Zhen, and Y. Du, “Consistency of implicit and explicit features matters for monocular 3d object detection,” arXiv preprint arXiv:2207.07933, 2022.

Summary

We haven't generated a summary for this paper yet.