Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution (2007.16100v2)

Published 31 Jul 2020 in cs.CV

Abstract: Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely. Given the limited hardware resources, existing 3D perception models are not able to recognize small instances (e.g., pedestrians, cyclists) very well due to the low-resolution voxelization and aggressive downsampling. To this end, we propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch. With negligible overhead, this point-based branch is able to preserve the fine details even from large outdoor scenes. To explore the spectrum of efficient 3D models, we first define a flexible architecture design space based on SPVConv, and we then present 3D Neural Architecture Search (3D-NAS) to search the optimal network architecture over this diverse design space efficiently and effectively. Experimental results validate that the resulting SPVNAS model is fast and accurate: it outperforms the state-of-the-art MinkowskiNet by 3.3%, ranking 1st on the competitive SemanticKITTI leaderboard. It also achieves 8x computation reduction and 3x measured speedup over MinkowskiNet with higher accuracy. Finally, we transfer our method to 3D object detection, and it achieves consistent improvements over the one-stage detection baseline on KITTI.

Citations (579)

View on Semantic Scholar

Summary

The paper introduces the SPVConv module that combines voxel and point-based branches to preserve detail in sparse 3D perception.
It proposes a 3D-NAS framework that optimizes architectures, achieving a 3.3% mIoU gain while reducing computation by up to 8x.
The approach transfers effectively to 3D object detection on the KITTI dataset, enhancing real-time perception for autonomous driving.

Summary of "Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution"

The paper presents a novel approach to improving 3D perception models, particularly in the context of autonomous driving, where the recognition of small objects like pedestrians and cyclists is crucial. The authors introduce Sparse Point-Voxel Convolution (SPVConv), a lightweight module that enhances the capabilities of Sparse Convolution by integrating a high-resolution point-based branch. This innovation aims to preserve fine details in large outdoor scenes with minimal computational overhead.

Key Contributions

SPVConv Module: The SPVConv module is designed to address the information loss seen in prior methods due to coarse voxelization or aggressive downsampling. By maintaining a high-resolution point-based branch alongside the sparse voxel-based structure, SPVConv ensures better feature retention for small objects across large scenes.
3D Neural Architecture Search (3D-NAS): The paper also introduces 3D-NAS, an architecture search framework that optimizes 3D models within a diverse design space defined by SPVConv. This framework enables the identification of optimal network architectures that balance accuracy and efficiency, considering hardware constraints.
Performance Improvement: Empirical results show that the SPVNAS model, derived from the proposed architecture search, achieves superior performance compared to existing models like MinkowskiNet. It provides a 3.3% improvement in mean intersection-over-union (mIoU) while offering significant computational advantages, including up to 8x computation reduction and 3x measured speedup.
Transferability: The methodology also proves effective in the area of 3D object detection, consistently enhancing performance over a baseline model on the challenging KITTI dataset.

Technical Analysis

Efficiency in Implementation: The use of GPU hash tables significantly accelerates the sparse voxelization and devoxelization processes, crucial for real-time applications.
Network Design Space: The expansive design space of 3D-NAS, which includes fine-grained channel numbers and elastic network depths, allows for a vast exploration of candidate architectures, leading to optimized performance tailored to specific constraints.

Implications and Future Directions

The implications of this work are noteworthy for both practical applications and theoretical advancements in AI. Practically, the SPVConv module enhances the ability of self-driving cars to perceive and react to dynamically changing environments with limited computational resources. Theoretically, the integration of sophisticated search frameworks like 3D-NAS may inspire further innovation in automated architecture design, potentially influencing similar developments in other domains of AI.

Future developments may explore the extension of such architectures to other 3D tasks or investigate the integration with different sensor modalities. Additionally, the continued refinement of neural architecture search methods could yield even more efficient, accurate models in various AI applications.

In summary, the paper presents a thoughtful advancement in 3D deep learning, offering both immediate applications in autonomous driving and broader implications for AI research and technology.

PDF Markdown

Related Papers

YouTube

Show All Videos