SPLATNet: Sparse Lattice Networks for Point Cloud Processing (1802.08275v4)

Published 22 Feb 2018 in cs.CV and cs.GR

Abstract: We present a network architecture for processing point clouds that directly operates on a collection of points represented as a sparse set of samples in a high-dimensional lattice. Naively applying convolutions on this lattice scales poorly, both in terms of memory and computational cost, as the size of the lattice increases. Instead, our network uses sparse bilateral convolutional layers as building blocks. These layers maintain efficiency by using indexing structures to apply convolutions only on occupied parts of the lattice, and allow flexible specifications of the lattice structure enabling hierarchical and spatially-aware feature learning, as well as joint 2D-3D reasoning. Both point-based and image-based representations can be easily incorporated in a network with such layers and the resulting model can be trained in an end-to-end manner. We present results on 3D segmentation tasks where our approach outperforms existing state-of-the-art techniques.

Citations (728)

View on Semantic Scholar

Summary

The paper introduces SPLATNet, which uses sparse bilateral convolutional layers on high-dimensional lattices to efficiently handle irregular 3D point cloud data.
The approach enables joint 2D-3D processing by mapping image features to point clouds, yielding up to a 10.6 IoU improvement in facade segmentation.
Hierarchical feature learning in SPLATNet captures broad spatial contexts, achieving competitive mIoU scores on ShapeNet with an instance average of 85.4.

SPLATNet: Sparse Lattice Networks for Point Cloud Processing

The paper "SPLATNet: Sparse Lattice Networks for Point Cloud Processing" presents a novel network architecture designed to process point clouds efficiently. This architecture, named SPLATNet (SParse LATtice Networks), addresses the challenges inherent in the irregular formats of point clouds and meshes obtained from modern 3D sensors, such as laser scanners. The core innovation lies in the use of sparse bilateral convolutional layers (BCLs) which operate in a high-dimensional lattice, significantly improving computation and memory efficiency.

Key Contributions

Bilateral Convolution Layers (BCLs): SPLATNet leverages BCLs for the convolution operations. BCLs enable the processing of sparse and unordered sets of 3D points by mapping the input points onto a higher-dimensional permutohedral lattice. Convolutions are then performed only on the occupied parts of the lattice, maintaining efficiency and scalability. This design allows for hierarchical and spatially-aware feature learning, accommodating both point-based and image-based representations.

Efficient 2D-3D Joint Processing: SPLATNet introduces mechanisms for mapping 2D image features into the 3D point cloud space and vice versa. This joint 2D-3D architecture can process multi-view images and the corresponding 3D point cloud in an end-to-end manner, leveraging the strengths of both representations. Specifically, the architecture includes components such as BCL $_{\text{2D}\rightarrow\text{3D}}$ and BCL $_{\text{3D}\rightarrow\text{2D}}$ , which facilitate the projection of features between 2D images and 3D points, enabling improved predictions in both domains.

Hierarchical Feature Learning: The network builds hierarchical and spatially-aware features using a series of BCLs with progressively coarser lattice scales. This approach ensures that deeper layers of the network capture broader spatial relationships, which is crucial for tasks like segmentation that require context from the entire 3D scene.

Empirical Validation

The effectiveness of SPLATNet is demonstrated on two major benchmarks: RueMonge2014 for facade segmentation and ShapeNet for part segmentation. The results showcase the superiority of SPLATNet over existing techniques in terms of segmentation accuracy and computational efficiency.

Results on RueMonge2014

For the task of facade segmentation, SPLATNet achieves significant improvements over state-of-the-art methods. Specifically, SPLATNet $_{\text{3D}}$ alone outperforms OctNet, a contemporary deep network, by 6.2 IoU points. Furthermore, by leveraging both 2D and 3D data (SPLATNet $_{\text{2D-3D}}$ ), the performance improves by an additional 4.4 IoU points, setting a new benchmark for this dataset. This highlights the efficacy of SPLATNet in integrating multi-view images with 3D point clouds.

Results on ShapeNet

In the ShapeNet part segmentation task, SPLATNet shows competitive performance with a class average mIoU of 83.7 and an instance average mIoU of 85.4. This is on par with or exceeds the performance of leading methods such as PointNet++ and SyncSpecCNN. Notably, SPLATNet's use of both 2D and 3D data further enhances its segmentation accuracy compared to using 3D data alone.

Theoretical and Practical Implications

The theoretical implications of SPLATNet's design are profound. By generalizing convolution to high-dimensional spaces through BCLs, SPLATNet opens avenues for more robust feature extraction from sparse and irregular data structures. The network's ability to handle different lattice spaces and scales introduces flexibility that could be beneficial in various domains beyond point cloud processing, such as video processing and any application requiring the integration of multi-dimensional data.

From a practical standpoint, SPLATNet's efficiency in dealing with large-scale 3D datasets has significant implications for applications like autonomous driving, robotic manipulation, and augmented reality, where real-time processing of 3D data is crucial. The demonstrated improvements in segmentation accuracy can lead to better object recognition and scene understanding, directly impacting the performance and safety of such systems.

Future Directions

Future research could explore the integration of additional input features such as texture and more complex lattice structures to further enhance the network's capability. Additionally, investigating the application of SPLATNet in other high-dimensional data processing tasks and extending the framework to support unsupervised or semi-supervised learning could yield further advancements.

In conclusion, SPLATNet represents a significant contribution to the field of point cloud processing, offering a flexible, scalable, and efficient solution for extracting meaningful features from 3D data. Its innovative use of BCLs and joint 2D-3D processing capabilities position it as a valuable tool for a wide range of applications in computer vision and beyond.