Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 70 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 428 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection (2409.01816v2)

Published 3 Sep 2024 in cs.CV

Abstract: Bird's-Eye-View (BEV) representation has emerged as a mainstream paradigm for multi-view 3D object detection, demonstrating impressive perceptual capabilities. However, existing methods overlook the geometric quality of BEV representation, leaving it in a low-resolution state and failing to restore the authentic geometric information of the scene. In this paper, we identify the drawbacks of previous approaches that limit the geometric quality of BEV representation and propose Radial-Cartesian BEV Sampling (RC-Sampling), which outperforms other feature transformation methods in efficiently generating high-resolution dense BEV representation to restore fine-grained geometric information. Additionally, we design a novel In-Box Label to substitute the traditional depth label generated from the LiDAR points. This label reflects the actual geometric structure of objects rather than just their surfaces, injecting real-world geometric information into the BEV representation. In conjunction with the In-Box Label, Centroid-Aware Inner Loss (CAI Loss) is developed to capture the inner geometric structure of objects. Finally, we integrate the aforementioned modules into a novel multi-view 3D object detector, dubbed GeoBEV, which achieves a state-of-the-art result of 66.2\% NDS on the nuScenes test set. The code is available at https://github.com/mengtan00/GeoBEV.git.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11621–11631.
  2. End-to-end object detection with transformers. In European conference on computer vision, 213–229. Springer.
  3. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4974–4983.
  4. Exploring recurrent long-term temporal fusion for multi-view 3d perception. arXiv preprint arXiv:2303.05970.
  5. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
  6. Bevdet4d: Exploit temporal cues in multi-camera 3d object detection. arXiv preprint arXiv:2203.17054.
  7. Bevpoolv2: A cutting-edge implementation of bevdet toward deployment. arXiv preprint arXiv:2211.17111.
  8. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790.
  9. Tig-bev: Multi-view bev 3d object detection via target inner-geometry learning. arXiv preprint arXiv:2212.13979.
  10. Polarformer: Multi-camera 3d object detection with polar transformer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 1042–1050.
  11. An energy and GPU-computation efficient backbone network for real-time object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 0–0.
  12. DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6684–6693.
  13. Bevstereo: Enhancing depth estimation in multi-view 3d object detection with temporal stereo. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 1486–1494.
  14. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 1477–1485.
  15. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In European conference on computer vision, 1–18. Springer.
  16. Fb-bev: Bev representation from forward-backward view transformations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6919–6928.
  17. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, 2980–2988.
  18. Sparse4d: Multi-view 3d object detection with sparse spatial-temporal fusion. arXiv preprint arXiv:2211.10581.
  19. Sparse4D v2: Recurrent Temporal Fusion with Sparse Model. arXiv preprint arXiv:2305.14018.
  20. Petr: Position embedding transformation for multi-view 3d object detection. In European Conference on Computer Vision, 531–548. Springer.
  21. Petrv2: A unified framework for 3d perception from multi-camera images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3262–3272.
  22. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In 2023 IEEE international conference on robotics and automation (ICRA), 2774–2781. IEEE.
  23. Is pseudo-lidar needed for monocular 3d object detection? In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3142–3152.
  24. Time will tell: New outlooks and a baseline for temporal multi-view 3d object detection. arXiv preprint arXiv:2210.02443.
  25. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, 194–210. Springer.
  26. Unifusion: Unified multi-view fusion transformer for spatial-temporal representation in bird’s-eye-view. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 8690–8699.
  27. Categorical depth distribution network for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8555–8564.
  28. Orthographic feature transform for monocular 3d object detection. arXiv preprint arXiv:1811.08188.
  29. Exploring object-centric temporal modeling for efficient multi-view 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3621–3631.
  30. Probabilistic and geometric depth: Detecting objects in perspective. In Conference on Robot Learning, 1475–1485. PMLR.
  31. Fcos3d: Fully convolutional one-stage monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 913–922.
  32. Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on Robot Learning, 180–191. PMLR.
  33. Sts: Surround-view temporal stereo for multi-view 3d detection. arXiv preprint arXiv:2208.10145.
  34. M2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation. arXiv preprint arXiv:2204.05088.
  35. Cape: Camera view position embedding for multi-view 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 21570–21579.
  36. BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 17830–17839.
  37. SA-BEV: Generating Semantic-Aware Bird’s-Eye-View Feature for Multi-view 3D Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3348–3357.
  38. Not all points are equal: Learning highly efficient point-based detectors for 3d lidar point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18953–18962.
  39. BEV-IO: Enhancing Bird’s-Eye-View 3D Detection with Instance Occupancy. arXiv preprint arXiv:2305.16829.
  40. Class-balanced grouping and sampling for point cloud 3d object detection. arXiv preprint arXiv:1908.09492.
  41. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In International Conference on Learning Representations.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.