Papers
Topics
Authors
Recent
Search
2000 character limit reached

PointCNN++: Efficient 3D Convolution

Updated 5 December 2025
  • The paper introduces a point-centric convolution that avoids coarse grid snapping, achieving sub-voxel precision in 3D point cloud processing.
  • It leverages an efficient MVMR GPU kernel that minimizes memory overhead and latency by grouping computations and reusing weights.
  • Experimental benchmarks show that PointCNN++ outperforms traditional voxel and point-based methods in memory usage, speed, and registration accuracy.

PointCNN++ is a generalized 3D convolutional architecture for point cloud data that directly operates on native, high-precision point coordinates. It eliminates the traditional precision-performance trade-off by unifying the flexibility of point-based approaches with the high efficiency of voxel-based convolutions. Central to the design is a point-centric formulation, and a Matrix-Vector Multiplication and Reduction (MVMR) GPU kernel that brings together accuracy, memory efficiency, and computational performance across a wide range of point cloud learning tasks (Li et al., 28 Nov 2025).

1. Mathematical Formulation of Point-Centric Convolution

PointCNN++ defines convolution on native points with the following components:

  • Pin={pj∈R3}j=1NinP^{in} = \{p_j \in \mathbb{R}^3\}_{j=1}^{N_{in}}: input point coordinates,
  • Fin={fj∈RCin}j=1NinF^{in} = \{f_j \in \mathbb{R}^{C_{in}}\}_{j=1}^{N_{in}}: input features,
  • Pout={qi∈R3}i=1NoutP^{out} = \{q_i \in \mathbb{R}^3\}_{i=1}^{N_{out}}: output (convolution center) points,
  • Fout={gi∈RCout}i=1NoutF^{out} = \{g_i \in \mathbb{R}^{C_{out}}\}_{i=1}^{N_{out}}: output features,
  • W={Wk∈RCout×Cin}k=1KW = \{W_k \in \mathbb{R}^{C_{out} \times C_{in}}\}_{k=1}^K: KK learnable kernel matrices (K=t3K = t^3 for t×t×tt \times t \times t local kernels).

For each output point qiq_i, a local adaptive voxelization of size tt and resolution t3t^3 is centered at qiq_i. If input point pjp_j falls into the kk-th local cell, the operator sets up a triplet (i,j,k)(i, j, k). Formally, if T⊂{1…Nout}×{1…Nin}×{1…K}T \subset \{1 \ldots N_{out}\} \times \{1 \ldots N_{in}\} \times \{1 \ldots K\} is the triplet set, the convolution is:

gi=∑(i,j,k)∈TWkfjg_i = \sum_{(i, j, k) \in T} W_k f_j

This mechanism avoids global quantization entirely, and the neighborhood inclusion for each qiq_i is determined either by KNN or radius search on true input coordinates.

Main advantages over voxel-based convolutions:

  • No snapping of centers to coarse grids
  • Accurate (non-proxy) neighborhood selection
  • Kernel resolution is local and tunable, not globally fixed

2. MVMR—Matrix-Vector Multiplication and Reduction Formalism

PointCNN++ reformulates the convolution as an unstructured sum of small MVMs followed by reduction over triplet indices. For a global triplet list T=[(i1,j1,k1),…,(i∣T∣,j∣T∣,k∣T∣)]T = [(i_1, j_1, k_1), \ldots, (i_{|T|}, j_{|T|}, k_{|T|})]:

Fout[i]=∑ℓ:iℓ=iWkℓFin[jℓ]F^{out}[i] = \sum_{\ell : i_\ell = i} W_{k_\ell} F^{in}[j_\ell]

Each element of this sum is a small MVM, yℓ=WkℓFin[jℓ]y_\ell = W_{k_\ell} F^{in}[j_\ell], aggregated into Fout[iℓ]F^{out}[i_\ell] by atomic addition. The computational complexity is O(∣T∣⋅Cout⋅Cin)O(|T| \cdot C_{out} \cdot C_{in}), and the main memory cost is streaming reads of kernel weights and input features, with writes into the final output tensor.

3. Dedicated Native-Point GPU Kernel Design

To achieve high throughput and minimal memory usage, PointCNN++ introduces a GPU kernel tailored to the MVMR pattern:

  • Sorts triplet list TT by kernel index kk to enable weight reuse in on-chip cache.
  • Groups LL consecutive triplets for warp-level execution.
  • Tiles kernel matrices into subblocks of (Bout,Bin)(B_{out}, B_{in}) for efficient on-chip MVM.
  • Uses atomic adds only for the output reduction.
  • Requires zero intermediate buffers beyond input, output, and kernel tensors.

A high-level algorithmic flow:

  • For each block of triplets: load kernel weight W[k]W[k] and feature Fin[j]F^{in}[j], perform MVMs, and accumulate in a register.
  • Output is written with a single atomic add per output index.
  • Hyperparameters (L=128,Bout=32,Bin=32)(L=128, B_{out}=32, B_{in}=32) yield robust performance across architectures and tasks.

4. Paradigm Comparison with Prior 3D Convolutional Operators

The following table summarizes qualitative distinctions among Voxel-based, Point + Transform (PointCNN/KPConv), and PointCNN++ convolutions:

Method Output Centers qiq_i Precision Memory Overhead
Voxel-based quantized voxel grid sub-voxel lost O(Ngrid)O(N_{grid}) + hashmaps
Point+Transform points →\to dense tensor high, slow O(NKCin)O(N K C_{in}) (padded)
PointCNN++ original input points full fidelity $0$ extra (beyond tensors)

Key differentiation:

PointCNN++ achieves the same theoretical compute complexity as sparse voxel and past point-based convolutions but requires no intermediate dense materialization or memory allocation, retaining true geometric detail at native resolution (Li et al., 28 Nov 2025).

5. Experimental Benchmarking

A. Micro-benchmarks (ResNet-18, 333^3 kernel, Cin=64,Cout=128C_{in}=64, C_{out}=128, RTX4090, 1M points)

  • GPU memory consumption (forward/backward):
    • MinkowskiEngine: ~1.2 GB / 2.3 GB
    • TorchSparse++: ~1.0 GB / 2.0 GB
    • KPConv: ~5.5 GB / 8.0 GB
    • PointCNN++: 0.37 GB / 0.59 GB
  • Latency per iteration (forward/backward):
    • ffVDB: 49.6 ms / 105.5 ms
    • TorchSparse++: 55 ms / 120 ms
    • KPConv: 120 ms / 240 ms
    • PointCNN++: 60.4 ms / 75.5 ms

B. Point Cloud Registration

  • KITTI Odometry (FCGF backbone replacement):
    • Relative Translation Error: $0.19$ m ±\pm $0.03$ (best)
    • Relative Rotation Error: 0.060∘±0.10∘0.060^\circ \pm 0.10^\circ (2nd best)
    • Recall @ (0.2 m,1∘)(0.2\,m, 1^\circ): 99.8%99.8\% (best)
    • Parameters: $8.75$M
  • 3DMatch (varied samples):
    • Registration Recall @5000 pts: 90.3%
    • Feature Matching Recall: 98.9% (best)
    • Inlier Ratio: 58.2%

These results confirm PointCNN++ as a plug-and-play backbone delivers state-of-the-art registration and matching performance, while reducing both memory usage and latency by an order of magnitude over other point-based operators (Li et al., 28 Nov 2025).

6. Guidelines for Adoption and Recognized Limitations

Integration: PointCNN++ is directly compatible as a drop-in replacement for sparse-convolution backbones (e.g., MinkowskiEngine). The remainder of existing network architectures require no changes. Neighborhood selection should be by fixed-radius or KNN search on true coordinates, with tt (kernel resolution) chosen to suit local complexity.

Hyperparameters: The default MVMR kernel settings—group size L=128L=128, tile sizes Bout=32,Bin=32B_{out}=32, B_{in}=32—perform robustly across scenarios and modern hardware.

Advantages:

  • Maintains sub-voxel geometric fidelity, crucial for registration, surface normal estimation, and fine segmentation.
  • Enables training with larger batch sizes and/or higher point counts due to minimal memory overhead.
  • Achieves balanced and competitive latency for both inference and training.

Limitations and Future Directions:

  • The sort overhead associated with highly dynamic neighborhoods can become significant; an automated tuner for sort axis may ameliorate this in specialized contexts.
  • For extremely large-scale point clouds (e.g., >100>100M points), external-memory or hierarchical sampling frameworks must be adopted.
  • Extension to anisotropic or deformable kernels (as in KPConv) would require additional complexity in triplet set construction.
  • Hybrid models incorporating attention or transformer modules atop the PointCNN++ backbone are a prospective avenue for broadening receptive field and context.

7. Context and Impact

PointCNN++ fundamentally generalizes sparse convolution from voxels to points, establishing voxel-based methods as a special, quantized case of point-centric convolution. Its architectural and kernel-level innovations address the geometric fidelity limitations that constrained prior efficient 3D learning paradigms, while establishing new memory and speed baselines. The formulation demonstrates that fine-grained geometric detail and high performance in point cloud deep learning are compatible and sets a path for high-fidelity, efficient 3D representation learning across segmentation, detection, and registration pipelines (Li et al., 28 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PointCNN++.