PointCNN++: Efficient 3D Convolution
- The paper introduces a point-centric convolution that avoids coarse grid snapping, achieving sub-voxel precision in 3D point cloud processing.
- It leverages an efficient MVMR GPU kernel that minimizes memory overhead and latency by grouping computations and reusing weights.
- Experimental benchmarks show that PointCNN++ outperforms traditional voxel and point-based methods in memory usage, speed, and registration accuracy.
PointCNN++ is a generalized 3D convolutional architecture for point cloud data that directly operates on native, high-precision point coordinates. It eliminates the traditional precision-performance trade-off by unifying the flexibility of point-based approaches with the high efficiency of voxel-based convolutions. Central to the design is a point-centric formulation, and a Matrix-Vector Multiplication and Reduction (MVMR) GPU kernel that brings together accuracy, memory efficiency, and computational performance across a wide range of point cloud learning tasks (Li et al., 28 Nov 2025).
1. Mathematical Formulation of Point-Centric Convolution
PointCNN++ defines convolution on native points with the following components:
- : input point coordinates,
- : input features,
- : output (convolution center) points,
- : output features,
- : learnable kernel matrices ( for local kernels).
For each output point , a local adaptive voxelization of size and resolution is centered at . If input point falls into the -th local cell, the operator sets up a triplet . Formally, if is the triplet set, the convolution is:
This mechanism avoids global quantization entirely, and the neighborhood inclusion for each is determined either by KNN or radius search on true input coordinates.
Main advantages over voxel-based convolutions:
- No snapping of centers to coarse grids
- Accurate (non-proxy) neighborhood selection
- Kernel resolution is local and tunable, not globally fixed
2. MVMR—Matrix-Vector Multiplication and Reduction Formalism
PointCNN++ reformulates the convolution as an unstructured sum of small MVMs followed by reduction over triplet indices. For a global triplet list :
Each element of this sum is a small MVM, , aggregated into by atomic addition. The computational complexity is , and the main memory cost is streaming reads of kernel weights and input features, with writes into the final output tensor.
3. Dedicated Native-Point GPU Kernel Design
To achieve high throughput and minimal memory usage, PointCNN++ introduces a GPU kernel tailored to the MVMR pattern:
- Sorts triplet list by kernel index to enable weight reuse in on-chip cache.
- Groups consecutive triplets for warp-level execution.
- Tiles kernel matrices into subblocks of for efficient on-chip MVM.
- Uses atomic adds only for the output reduction.
- Requires zero intermediate buffers beyond input, output, and kernel tensors.
A high-level algorithmic flow:
- For each block of triplets: load kernel weight and feature , perform MVMs, and accumulate in a register.
- Output is written with a single atomic add per output index.
- Hyperparameters yield robust performance across architectures and tasks.
4. Paradigm Comparison with Prior 3D Convolutional Operators
The following table summarizes qualitative distinctions among Voxel-based, Point + Transform (PointCNN/KPConv), and PointCNN++ convolutions:
| Method | Output Centers | Precision | Memory Overhead |
|---|---|---|---|
| Voxel-based | quantized voxel grid | sub-voxel lost | + hashmaps |
| Point+Transform | points dense tensor | high, slow | (padded) |
| PointCNN++ | original input points | full fidelity | $0$ extra (beyond tensors) |
Key differentiation:
PointCNN++ achieves the same theoretical compute complexity as sparse voxel and past point-based convolutions but requires no intermediate dense materialization or memory allocation, retaining true geometric detail at native resolution (Li et al., 28 Nov 2025).
5. Experimental Benchmarking
A. Micro-benchmarks (ResNet-18, kernel, , RTX4090, 1M points)
- GPU memory consumption (forward/backward):
- MinkowskiEngine: ~1.2 GB / 2.3 GB
- TorchSparse++: ~1.0 GB / 2.0 GB
- KPConv: ~5.5 GB / 8.0 GB
- PointCNN++: 0.37 GB / 0.59 GB
- Latency per iteration (forward/backward):
- VDB: 49.6 ms / 105.5 ms
- TorchSparse++: 55 ms / 120 ms
- KPConv: 120 ms / 240 ms
- PointCNN++: 60.4 ms / 75.5 ms
B. Point Cloud Registration
- KITTI Odometry (FCGF backbone replacement):
- Relative Translation Error: $0.19$ m  $0.03$ (best)
- Relative Rotation Error: (2nd best)
- Recall @ : (best)
- Parameters: $8.75$M
- 3DMatch (varied samples):
- Registration Recall @5000 pts: 90.3%
- Feature Matching Recall: 98.9% (best)
- Inlier Ratio: 58.2%
These results confirm PointCNN++ as a plug-and-play backbone delivers state-of-the-art registration and matching performance, while reducing both memory usage and latency by an order of magnitude over other point-based operators (Li et al., 28 Nov 2025).
6. Guidelines for Adoption and Recognized Limitations
Integration: PointCNN++ is directly compatible as a drop-in replacement for sparse-convolution backbones (e.g., MinkowskiEngine). The remainder of existing network architectures require no changes. Neighborhood selection should be by fixed-radius or KNN search on true coordinates, with (kernel resolution) chosen to suit local complexity.
Hyperparameters: The default MVMR kernel settings—group size , tile sizes —perform robustly across scenarios and modern hardware.
Advantages:
- Maintains sub-voxel geometric fidelity, crucial for registration, surface normal estimation, and fine segmentation.
- Enables training with larger batch sizes and/or higher point counts due to minimal memory overhead.
- Achieves balanced and competitive latency for both inference and training.
Limitations and Future Directions:
- The sort overhead associated with highly dynamic neighborhoods can become significant; an automated tuner for sort axis may ameliorate this in specialized contexts.
- For extremely large-scale point clouds (e.g., M points), external-memory or hierarchical sampling frameworks must be adopted.
- Extension to anisotropic or deformable kernels (as in KPConv) would require additional complexity in triplet set construction.
- Hybrid models incorporating attention or transformer modules atop the PointCNN++ backbone are a prospective avenue for broadening receptive field and context.
7. Context and Impact
PointCNN++ fundamentally generalizes sparse convolution from voxels to points, establishing voxel-based methods as a special, quantized case of point-centric convolution. Its architectural and kernel-level innovations address the geometric fidelity limitations that constrained prior efficient 3D learning paradigms, while establishing new memory and speed baselines. The formulation demonstrates that fine-grained geometric detail and high performance in point cloud deep learning are compatible and sets a path for high-fidelity, efficient 3D representation learning across segmentation, detection, and registration pipelines (Li et al., 28 Nov 2025).