RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR Point Cloud Segmentation (2103.12978v1)

Published 24 Mar 2021 in cs.CV

Abstract: Point clouds can be represented in many forms (views), typically, point-based sets, voxel-based cells or range-based images(i.e., panoramic view). The point-based view is geometrically accurate, but it is disordered, which makes it difficult to find local neighbors efficiently. The voxel-based view is regular, but sparse, and computation grows cubically when voxel resolution increases. The range-based view is regular and generally dense, however spherical projection makes physical dimensions distorted. Both voxel- and range-based views suffer from quantization loss, especially for voxels when facing large-scale scenes. In order to utilize different view's advantages and alleviate their own shortcomings in fine-grained segmentation task, we propose a novel range-point-voxel fusion network, namely RPVNet. In this network, we devise a deep fusion framework with multiple and mutual information interactions among these three views and propose a gated fusion module (termed as GFM), which can adaptively merge the three features based on concurrent inputs. Moreover, the proposed RPV interaction mechanism is highly efficient, and we summarize it into a more general formulation. By leveraging this efficient interaction and relatively lower voxel resolution, our method is also proved to be more efficient. Finally, we evaluated the proposed model on two large-scale datasets, i.e., SemanticKITTI and nuScenes, and it shows state-of-the-art performance on both of them. Note that, our method currently ranks 1st on SemanticKITTI leaderboard without any extra tricks.

Citations (223)

View on Semantic Scholar

Summary

The paper introduces RPVNet, a deep fusion network combining range, point, and voxel representations for LiDAR segmentation using a gated fusion module and efficient interactions.
RPVNet achieves state-of-the-art performance on large-scale datasets like SemanticKITTI and nuScenes, demonstrating robustness and efficiency without extra enhancements.
Novel range-point-voxel interaction mechanisms, like efficient hash mapping, are proposed for robust feature propagation across views, offering generalizability to other multi-view problems.

An Analysis of RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR Point Cloud Segmentation

The paper presents RPVNet, a novel approach to LiDAR point cloud segmentation, which seeks to combine the strengths of multiple views—namely, point-based, voxel-based, and range-based representations—into a unified framework. RPVNet introduces a deep fusion framework that leverages the complementary natures of these different views while minimizing their individual weaknesses through multi-view interactive learning.

Key Contributions

Deep Fusion Architecture: RPVNet employs a structured three-branch network architecture, which embraces the unique advantages of each representation format. The use of a gated fusion module (GFM) allows the network to adaptively merge features from these three views depending on the context and data characteristics, ensuring the most informative features are utilized effectively.
Efficient Interaction Mechanisms: Novel range-point-voxel (RPV) interaction mechanisms are proposed to efficiently index and propagate features across different views using operations like hash mapping. This methodology not only speeds up the fusion process but also generalizes it to be used with other potential view combinations in future research.
Scalability and Performance: The paper reports that RPVNet achieves state-of-the-art results on the SemanticKITTI and nuScenes datasets, underscoring its robust performance in large-scale outdoor point cloud scenes. Notably, RPVNet ranks first on the SemanticKITTI leaderboard without utilizing additional enhancements that are common in other leading models.
Solution to Class Imbalance: An instance CutMix augmentation technique is adopted to address the common issue of class imbalance within the datasets, providing a more balanced training regimen that improves segmentation accuracy across different classes.

Performance Evaluation

The results indicate that RPVNet exceeds the performance of existing methods by effectively integrating voxel, point, and range data streams. The incorporation of multi-view fusion strategies leads to consistently high mIoU values, significantly outperforming single-view and dual-view approaches. Particularly noteworthy is the efficiency with which the network handles the computational complexity of large-scale datasets, demonstrating both high accuracy and real-time capability.

Implications and Future Developments

In practice, the implications of this research are substantial for applications involving autonomous driving, urban planning, and robotics. The ability to accurately segment and interpret 3D environments in real-time is critical for these fields, and RPVNet's capabilities point towards more reliable and comprehensive point cloud processing solutions.

Theoretically, the proposed multi-view fusion interacts with how neural networks can be designed to optimize data representation across different structures, possibly extending to scenarios beyond LiDAR data. The hash mapping technique for efficient feature propagation is an innovation that other multi-view problems can explore.

The promising results open several avenues for future AI developments, including exploring the introduction of additional data views, such as intensity or time-based attributes, in the segmentation process. Further, RPVNet's formulation can serve as a baseline for testing the integration of emerging data types, such as multispectral or thermal imaging, which are increasingly relevant in environmental and surveillance applications.

In conclusion, RPVNet presents a significant step forward in leveraging the complementary strengths of multi-view data representations, yielding a robust and efficient approach to 3D point cloud segmentation. The insights gained from this paper will likely influence future research directions in 3D computer vision and related fields.

PDF Markdown