Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 178 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 73 tok/s Pro
Kimi K2 231 tok/s Pro
GPT OSS 120B 427 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

OnlineBEV: Recurrent Temporal Fusion in Bird's Eye View Representations for Multi-Camera 3D Perception (2507.08644v1)

Published 11 Jul 2025 in cs.CV

Abstract: Multi-view camera-based 3D perception can be conducted using bird's eye view (BEV) features obtained through perspective view-to-BEV transformations. Several studies have shown that the performance of these 3D perception methods can be further enhanced by combining sequential BEV features obtained from multiple camera frames. However, even after compensating for the ego-motion of an autonomous agent, the performance gain from temporal aggregation is limited when combining a large number of image frames. This limitation arises due to dynamic changes in BEV features over time caused by object motion. In this paper, we introduce a novel temporal 3D perception method called OnlineBEV, which combines BEV features over time using a recurrent structure. This structure increases the effective number of combined features with minimal memory usage. However, it is critical to spatially align the features over time to maintain strong performance. OnlineBEV employs the Motion-guided BEV Fusion Network (MBFNet) to achieve temporal feature alignment. MBFNet extracts motion features from consecutive BEV frames and dynamically aligns historical BEV features with current ones using these motion features. To enforce temporal feature alignment explicitly, we use Temporal Consistency Learning Loss, which captures discrepancies between historical and target BEV features. Experiments conducted on the nuScenes benchmark demonstrate that OnlineBEV achieves significant performance gains over the current best method, SOLOFusion. OnlineBEV achieves 63.9% NDS on the nuScenes test set, recording state-of-the-art performance in the camera-only 3D object detection task.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.