Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction (2304.00967v1)

Published 3 Apr 2023 in cs.CV

Abstract: In this paper, we propose a new paradigm, named Historical Object Prediction (HoP) for multi-view 3D detection to leverage temporal information more effectively. The HoP approach is straightforward: given the current timestamp t, we generate a pseudo Bird's-Eye View (BEV) feature of timestamp t-k from its adjacent frames and utilize this feature to predict the object set at timestamp t-k. Our approach is motivated by the observation that enforcing the detector to capture both the spatial location and temporal motion of objects occurring at historical timestamps can lead to more accurate BEV feature learning. First, we elaborately design short-term and long-term temporal decoders, which can generate the pseudo BEV feature for timestamp t-k without the involvement of its corresponding camera images. Second, an additional object decoder is flexibly attached to predict the object targets using the generated pseudo BEV feature. Note that we only perform HoP during training, thus the proposed method does not introduce extra overheads during inference. As a plug-and-play approach, HoP can be easily incorporated into state-of-the-art BEV detection frameworks, including BEVFormer and BEVDet series. Furthermore, the auxiliary HoP approach is complementary to prevalent temporal modeling methods, leading to significant performance gains. Extensive experiments are conducted to evaluate the effectiveness of the proposed HoP on the nuScenes dataset. We choose the representative methods, including BEVFormer and BEVDet4D-Depth to evaluate our method. Surprisingly, HoP achieves 68.5% NDS and 62.4% mAP with ViT-L on nuScenes test, outperforming all the 3D object detectors on the leaderboard. Codes will be available at https://github.com/Sense-X/HoP.

Citations (30)

View on Semantic Scholar

Summary

The paper introduces a novel temporal training approach that integrates historical object prediction to improve 3D detection performance.
It employs a multi-view fusion methodology to achieve robust object detection in dynamic environments.
Experimental evaluations demonstrate significant gains over traditional methods, underlining its potential for real-world autonomous systems.

Overview of the ICCV Author Guidelines Document

The paper "LaTeX Author Guidelines for ICCV Proceedings" serves as a comprehensive reference for authors submitting manuscripts for the International Conference on Computer Vision (ICCV). It establishes a detailed set of formatting, submission, and review criteria aimed at standardizing submissions to align with the conference's publishing framework. This consistency is essential for both the review process and the eventual integration into IEEE Xplore. The document's instructions are primarily targeted towards authors using the LaTeX document preparation system, reflective of its widespread adoption in the computer science community.

Key Aspects of the Guidelines

Language and Submission Policies:
- The guidelines require manuscripts to be submitted in English.
- A critical provision discussed is the policy on dual submissions, directing authors to refer to external ICCV guidelines for specifics.
Page Limitations and Formatting:
- Papers should not exceed eight pages, excluding references. Overlength papers are rejected without review, underscoring the importance of strict adherence to the specified page limit.
- Formatting includes the necessity for a printed ruler in draft submissions to facilitate precise reviewer feedback, which must be removed in the camera-ready copy.
Blind Review Process:
- The guidelines emphasize the strategy for anonymizing submissions. Authors are advised to avoid self-identification in citations and to appropriately phrase references to their own prior work.
Mathematical Expressions:
- The guideline advocates for numbered sections and displayed equations, endorsing clear referencing ability for equations within the manuscript.
Illustrations and Graphics:
- Authors should prioritize high-quality graphics with careful consideration of font sizes and line widths, ensuring readability in both digital and printed formats.
Formatting Specifics:
- A two-column text format is mandatory, with defined margins and font styles. This ensures uniformity across submissions and aids in the digital publication process.

Implications for Authors and Reviewers

The document provides a structured framework that enhances the clarity and professionalism of submissions. This format not only streamlines the review process but also ensures that published works meet the high standards expected by the ICCV and its associated readership. Authors benefit from following these guidelines as it potentially improves the visibility and perception of their work. For reviewers, the standardized format aids in evaluating each paper's merits with efficiency and focus on content quality.

Future Considerations

As the field of computer vision evolves, so too might the requirements for manuscript preparation. This could include adaptations for new types of content, such as interactive data visualizations or multimedia elements, which are becoming increasingly relevant. Furthermore, with the growing emphasis on open science, there might be future inclusions related to preprint servers or supplementary material repositories.

In summary, this paper on author guidelines is an essential resource for researchers aiming to contribute to ICCV, ensuring a coherent submission process aligned with professional and academic standards.

PDF Markdown

Related Papers

GitHub

GitHub - Sense-X/HoP: [ICCV 2023] Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction (188 stars)