M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers (2104.11896v3)

Published 24 Apr 2021 in cs.CV

Abstract: We present a novel architecture for 3D object detection, M3DeTR, which combines different point cloud representations (raw, voxels, bird-eye view) with different feature scales based on multi-scale feature pyramids. M3DeTR is the first approach that unifies multiple point cloud representations, feature scales, as well as models mutual relationships between point clouds simultaneously using transformers. We perform extensive ablation experiments that highlight the benefits of fusing representation and scale, and modeling the relationships. Our method achieves state-of-the-art performance on the KITTI 3D object detection dataset and Waymo Open Dataset. Results show that M3DeTR improves the baseline significantly by 1.48% mAP for all classes on Waymo Open Dataset. In particular, our approach ranks 1st on the well-known KITTI 3D Detection Benchmark for both car and cyclist classes, and ranks 1st on Waymo Open Dataset with single frame point cloud input. Our code is available at: https://github.com/rayguan97/M3DETR.

Citations (106)

View on Semantic Scholar

Summary

The paper introduces a novel transformer-based architecture that leverages diverse data representations to enhance 3D object detection.
It employs multi-scale processing to capture varied object sizes, thereby improving detection accuracy across complex scenes.
The approach demonstrates significant performance gains on standard benchmarks, underscoring its potential in real-world applications.

An Overview of Author Guidelines for ICCV Proceedings

The paper "LaTeX Author Guidelines for ICCV Proceedings" provides a comprehensive guide on preparing and formatting manuscripts for submission to the International Conference on Computer Vision (ICCV). This document is a critical resource for authors intending to submit to this venue, detailing specifications for manuscript preparation in LaTeX, undergoing the review process, and other essential formatting requirements. The following essay explores the significant aspects highlighted in the paper, focusing on technical guidelines that experienced researchers should consider when preparing their submissions.

Main Structure and Formatting Requirements

The paper outlines several key formatting requirements for authors. It specifies that manuscripts must be formatted in a two-column layout, with specified fonts and measurements. The main title should be centered and printed in a 14-point boldface Times font, while the main text should be in 10-point single-spaced Times font, fully justified. This detail ensures uniformity across all submissions, maintaining the professional standard of ICCV proceedings.

Page layout is also strictly defined, with precise margins and spacing to ensure consistency. Authors are cautioned against any modifications to formatting that might cause their submission to exceed the permissible eight-page limit (excluding references), as overlength papers will not be reviewed.

Blind Review and Anonymization

The document emphasizes the importance of blind review, providing clear guidance on ensuring anonymity while still allowing for self-citation. It advises against using direct identifiers like "my" or "our" when referring to one's prior work, suggesting alternative phrasing such as referring to oneself in the third person. It further clarifies that removing citations to one's own work is unnecessary, provided the references themselves do not break anonymity.

Linguistic and Mathematical Precision

Attention is given to the use of language, stipulating that all manuscripts must be in English and adhere to proper English writing conventions. Authors are directed to employ precise language, ensuring clarity and comprehensibility. The paper also mandates numbering for all sections and displayed equations, facilitating easy reference for readers and reviewers.

Formatting for mathematical expressions is given due consideration, ensuring they remain consistent with the document’s overall aesthetic. Authors are advised on how best to integrate mathematics into their text without disrupting the document’s flow.

Figures, Tables, and References

Figures and tables are to be centered and designed to maintain coherence with the text. The document includes specifications for figure sizes relative to text width and ensures that any graphical content is visible and readable even when printed. Authors are instructed to use appropriate software features for formatting such figures and tables in their LaTeX documents.

The reference section is also given specific formatting, requiring single-spaced, 9-point Times font with numerically ordered citations in the text. The guidelines ensure that referencing practices align with academic standards, enabling easy traceability of cited work.

Implications and Future Speculations

While the paper primarily focuses on submission guidelines, the implications for facilitating a streamlined submission and review process are significant. By adhering to these detailed formatting and submission instructions, authors can ensure that their work is assessed solely on the merit of its content and scientific contribution, rather than its adherence to style. This may indirectly influence authors in other fields to consider adopting similar detailed guidelines to enhance consistency and fairness in peer-reviewed conferences.

As AI and computer vision research continues to evolve, such structured guidelines will ensure that the ICCV maintains its standing as a leading forum for sharing cutting-edge findings. These guidelines also suggest potential automation processes in manuscript formatting that could be further explored to simplify authors' preparation efforts.

In conclusion, "LaTeX Author Guidelines for ICCV Proceedings" is an essential guide for authors aiming to submit their work to ICCV, ensuring a standard level of quality and professionalism. By closely following these guidelines, authors contribute to the high-quality publications expected from such a prestigious conference, thereby supporting the dissemination of advancements within the computer vision community.

PDF Markdown

Related Papers

GitHub

GitHub - rayguan97/M3DETR: Code base for M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers (95 stars)

YouTube

Show All Videos