Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 177 tok/s
Gemini 2.5 Pro 43 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 119 tok/s Pro
Kimi K2 202 tok/s Pro
GPT OSS 120B 439 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

You Only Look Bottom-Up for Monocular 3D Object Detection (2401.15319v1)

Published 27 Jan 2024 in cs.CV

Abstract: Monocular 3D Object Detection is an essential task for autonomous driving. Meanwhile, accurate 3D object detection from pure images is very challenging due to the loss of depth information. Most existing image-based methods infer objects' location in 3D space based on their 2D sizes on the image plane, which usually ignores the intrinsic position clues from images, leading to unsatisfactory performances. Motivated by the fact that humans could leverage the bottom-up positional clues to locate objects in 3D space from a single image, in this paper, we explore the position modeling from the image feature column and propose a new method named You Only Look Bottum-Up (YOLOBU). Specifically, our YOLOBU leverages Column-based Cross Attention to determine how much a pixel contributes to pixels above it. Next, the Row-based Reverse Cumulative Sum (RRCS) is introduced to build the connections of pixels in the bottom-up direction. Our YOLOBU fully explores the position clues for monocular 3D detection via building the relationship of pixels from the bottom-up way. Extensive experiments on the KITTI dataset demonstrate the effectiveness and superiority of our method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, “Monocular 3d object detection for autonomous driving,” in CVPR, 2016, pp. 2147–2156.
  2. Y. Liu, L. Wang, and M. Liu, “Yolostereo3d: A step back to 2d for efficient stereo 3d detection,” in ICRA.   IEEE, 2021, pp. 13 018–13 024.
  3. Y. Jung, S.-W. Seo, and S.-W. Kim, “Fast point clouds upsampling with uncertainty quantification for autonomous vehicles,” in ICRA.   IEEE, 2022, pp. 7776–7782.
  4. V. R. Kumar, S. A. Hiremath, M. Bach, S. Milz, C. Witt, C. Pinard, S. Yogamani, and P. Mäder, “Fisheyedistancenet: Self-supervised scale-aware distance estimation using monocular fisheye camera for autonomous driving,” in ICRA.   IEEE, 2020, pp. 574–581.
  5. G. Brazil and X. Liu, “M3d-rpn: Monocular 3d region proposal network for object detection,” in ICCV, 2019, pp. 9287–9296.
  6. Z. Liu, Z. Wu, and R. Tóth, “Smoke: Single-stage monocular 3d object detection via keypoint estimation,” in CVPR, 2020, pp. 996–997.
  7. Q. Lian, B. Ye, R. Xu, W. Yao, and T. Zhang, “Exploring geometric consistency for monocular 3d object detection,” in CVPR, 2022, pp. 1685–1694.
  8. Y. Liu, Y. Yixuan, and M. Liu, “Ground-aware monocular 3d object detection for autonomous driving,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 919–926, 2021.
  9. Z. Qin and X. Li, “Monoground: Detecting monocular 3d objects from the ground,” in CVPR, 2022, pp. 3793–3802.
  10. A. Frickenstein, M.-R. Vemparala, J. Mayr, N.-S. Nagaraja, C. Unger, F. Tombari, and W. Stechele, “Binary dad-net: Binarized driveable area detection network for autonomous driving,” in ICRA.   IEEE, 2020, pp. 2295–2301.
  11. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in ECCV.   Springer, 2020, pp. 213–229.
  12. X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” in ICLR, 2021.
  13. Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for point cloud based 3d object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4490–4499.
  14. Y. Yan, Y. Mao, and B. Li, “Second: Sparsely embedded convolutional detection,” Sensors, vol. 18, no. 10, p. 3337, 2018.
  15. A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” in CVPR, 2019, pp. 12 697–12 705.
  16. S. Shi, X. Wang, and H. Li, “Pointrcnn: 3d object proposal generation and detection from point cloud,” in CVPR, 2019, pp. 770–779.
  17. Z. Liu, X. Zhao, T. Huang, R. Hu, Y. Zhou, and X. Bai, “Tanet: Robust 3d object detection from point clouds with triple attention,” in AAAI, vol. 34, no. 07, 2020, pp. 11 677–11 684.
  18. J. Li, Z. Liu, J. Hou, and D. Liang, “Dds3d: Dense pseudo-labels with dynamic threshold for semi-supervised 3d object detection,” ICRA, 2023.
  19. D. Zhang, D. Liang, Z. Zhou, J. Li, X. Ye, Z. Liu, X. Tan, and X. Bai, “A simple vision transformer forweakly semi-supervised 3d object detection,” in ICCV, 2023.
  20. D. Zhang, D. Liang, H. Yang, Z. Zou, X. Ye, Z. Liu, and X. Bai, “Sam3d: Zero-shot 3d object detection via segment anything model,” arXiv preprint arXiv:2306.02245, 2023.
  21. Z. Liu, T. Huang, B. Li, X. Chen, X. Wang, and X. Bai, “Epnet++: Cascade bi-directional fusion for multi-modal 3d object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  22. Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. L. Rus, and S. Han, “Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 2774–2781.
  23. T. Liang, H. Xie, K. Yu, Z. Xia, Z. Lin, Y. Wang, T. Tang, B. Wang, and Z. Tang, “Bevfusion: A simple and robust lidar-camera fusion framework,” Advances in Neural Information Processing Systems, vol. 35, pp. 10 421–10 434, 2022.
  24. Y. Wang, W.-L. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Q. Weinberger, “Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving,” in CVPR, 2019, pp. 8445–8453.
  25. J. Deng, S. Shi, P. Li, W. Zhou, Y. Zhang, and H. Li, “Voxel r-cnn: Towards high performance voxel-based 3d object detection,” in AAAI, vol. 35, no. 2, 2021, pp. 1201–1209.
  26. W. Bao, B. Xu, and Z. Chen, “Monofenet: Monocular 3d object detection with feature enhancement networks,” IEEE Transactions on Image Processing, vol. 29, pp. 2753–2765, 2019.
  27. M. Ding, Y. Huo, H. Yi, Z. Wang, J. Shi, Z. Lu, and P. Luo, “Learning depth-guided convolutions for monocular 3d object detection,” in CVPRW, 2020, pp. 1000–1001.
  28. L. Jing, R. Yu, H. Kretzschmar, K. Li, C. R. Qi, H. Zhao, A. Ayvaci, X. Chen, D. Cower, Y. Li, et al., “Depth estimation matters most: Improving per-object depth estimation for monocular 3d detection and tracking,” in ICRA, 2022.
  29. T. Xie, K. Wang, R. Li, X. Tang, and L. Zhao, “Panet: A pixel-level attention network for 6d pose estimation with embedding vector features,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1840–1847, 2021.
  30. X. Ye, L. Du, Y. Shi, Y. Li, X. Tan, J. Feng, E. Ding, and S. Wen, “Monocular 3d object detection via feature domain adaptation,” in ECCV.   Springer, 2020, pp. 17–34.
  31. Z. Chong, X. Ma, H. Zhang, Y. Yue, H. Li, Z. Wang, and W. Ouyang, “Monodistill: Learning spatial features for monocular 3d object detection,” in ICLR, 2021.
  32. Z. Zhou, L. Du, X. Ye, Z. Zou, X. Tan, L. Zhang, X. Xue, and J. Feng, “Sgm3d: Stereo guided monocular 3d object detection,” IEEE Robotics and Automation Letters, 2022.
  33. J. Philion and S. Fidler, “Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d,” in ECCV.   Springer, 2020, pp. 194–210.
  34. C. Reading, A. Harakeh, J. Chae, and S. L. Waslander, “Categorical depth distribution network for monocular 3d object detection,” in CVPR, 2021, pp. 8555–8564.
  35. D. Park, R. Ambrus, V. Guizilini, J. Li, and A. Gaidon, “Is pseudo-lidar needed for monocular 3d object detection?” in ICCV, 2021, pp. 3142–3152.
  36. B. Li, W. Ouyang, L. Sheng, X. Zeng, and X. Wang, “Gs3d: An efficient 3d object detection framework for autonomous driving,” in CVPR, 2019, pp. 1019–1028.
  37. P. Li, H. Zhao, P. Liu, and F. Cao, “Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving,” in ECCV.   Springer, 2020, pp. 644–660.
  38. Y. Li, Y. Chen, J. He, and Z. Zhang, “Densely constrained depth estimator for monocular 3d object detection,” in ECCV, 2022.
  39. X. Liu, N. Xue, and T. Wu, “Learning auxiliary monocular contexts helps monocular 3d object detection,” in AAAI, vol. 36, no. 2, 2022, pp. 1810–1818.
  40. K. Xiong, S. Gong, X. Ye, X. Tan, J. Wan, E. Ding, J. Wang, and X. Bai, “Cape: Camera view position embedding for multi-view 3d object detection,” in CVPR, 2023.
  41. X. Ma, Y. Zhang, D. Xu, D. Zhou, S. Yi, H. Li, and W. Ouyang, “Delving into localization errors for monocular 3d object detection,” in CVPR, 2021, pp. 4721–4730.
  42. Y. Lu, X. Ma, L. Yang, T. Zhang, Y. Liu, Q. Chu, J. Yan, and W. Ouyang, “Geometry uncertainty projection network for monocular 3d object detection,” in ICCV, 2021, pp. 3111–3121.
  43. L. Haoran, D. Zicheng, M. Mingjun, C. Yaran, L. Jiaqi, and Z. Dongbin, “Mvm3det: A novel method for multi-view monocular 3d detection,” in ICRA, 2021.
  44. A. Saha, O. Mendez, C. Russell, and R. Bowden, “Translating images into maps,” in ICRA, 2022, pp. 9200–9206.
  45. L. Peng, X. Wu, Z. Yang, H. Liu, and D. Cai, “Did-m3d: Decoupling instance depth for monocular 3d object detection,” in ECCV, 2022.
  46. C. Liu, S. Gu, L. Van Gool, and R. Timofte, “Deep line encoding for monocular 3d object detection and depth prediction,” in BMVC, 2021, p. 354.
  47. Y. Kim and D. Kum, “Deep learning based vehicle position and orientation estimation via inverse perspective mapping image,” in 2019 IEEE Intelligent Vehicles Symposium (IV).   IEEE, 2019, pp. 317–323.
  48. A. Palazzi, G. Borghi, D. Abati, S. Calderara, and R. Cucchiara, “Learning to map vehicles into bird’s eye view,” in Image Analysis and Processing-ICIAP 2017: 19th International Conference, Catania, Italy, September 11-15, 2017, Proceedings, Part I 19.   Springer, 2017, pp. 233–243.
  49. M. Zhu, S. Zhang, Y. Zhong, P. Lu, H. Peng, and J. Lenneman, “Monocular 3d vehicle detection using uncalibrated traffic cameras through homography,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 3814–3821.
  50. L. Reiher, B. Lampe, and L. Eckstein, “A sim2real deep learning approach for the transformation of images from multiple vehicle-mounted cameras to a semantically segmented image in bird’s eye view,” in 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).   IEEE, 2020, pp. 1–7.
  51. T. Wang, X. Zhu, J. Pang, and D. Lin, “Fcos3d: Fully convolutional one-stage monocular 3d object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 913–922.
  52. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” NeurIPS, vol. 30, 2017.
  53. X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in CVPR, 2018, pp. 7794–7803.
  54. Z. Liu, D. Zhou, F. Lu, J. Fang, and L. Zhang, “Autoshape: Real-time shape-aware monocular 3d object detection,” in ICCV, 2021, pp. 15 641–15 650.
  55. Y. Chen, L. Tai, K. Sun, and M. Li, “Monopair: Monocular 3d object detection using pairwise spatial relationships,” in CVPR, 2020, pp. 12 093–12 102.
  56. Y. Zhou, Y. He, H. Zhu, C. Wang, H. Li, and Q. Jiang, “Monocular 3d object detection: An extrinsic parameter free approach,” in CVPR, 2021, pp. 7556–7566.
  57. Y. Zhang, J. Lu, and J. Zhou, “Objects are different: Flexible monocular 3d object detection,” in CVPR, 2021, pp. 3289–3298.
  58. X. Shi, Q. Ye, X. Chen, C. Chen, Z. Chen, and T.-K. Kim, “Geometry-based distance decomposition for monocular 3d object detection,” in ICCV, 2021, pp. 15 172–15 181.
  59. J. Gu, B. Wu, L. Fan, J. Huang, S. Cao, Z. Xiang, and X.-S. Hua, “Homography loss for monocular 3d object detection,” in CVPR, 2022, pp. 1080–1089.
  60. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in CVPR, 2012.
  61. F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,” in CVPR, 2018, pp. 2403–2412.
  62. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: an imperative style, high-performance deep learning library,” in NeurIPS, 2019, pp. 8026–8037.
  63. R. Liu, J. Lehman, P. Molino, F. Petroski Such, E. Frank, A. Sergeev, and J. Yosinski, “An intriguing failing of convolutional neural networks and the coordconv solution,” in NeurIPS, 2018.
  64. V. Guizilini, R. Ambrus, S. Pillai, A. Raventos, and A. Gaidon, “3d packing for self-supervised monocular depth estimation,” in CVPR, 2020, pp. 2485–2494.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.