Kernel Point Convolutions (KPConv)

Updated 23 November 2025

Kernel Point Convolutions (KPConv) are spatially-grounded operators that directly process unstructured 3D point clouds by leveraging continuous filters based on local geometric context.
KPConv variants—including rigid, deformable, depthwise (KPConvD), and attention-based (KPConvX) versions—offer practical trade-offs in efficiency, flexibility, and invariance to Euclidean transformations.
Widely applied in classification, segmentation, and registration, KPConv architectures have achieved state-of-the-art benchmarks on datasets like ModelNet40 and ShapeNetPart.

Kernel Point Convolutions (KPConv) are spatially-grounded convolutional operators designed for deep learning on unstructured 3D point clouds. Instead of relying on voxelization or graph-based intermediates, KPConv directly defines continuous filters using collections of kernel points placed within a local neighborhood, leveraging geometric context for feature aggregation. Recent advances include deformable variants, exact equivariance/invariance via frame averaging, and kernel attention mechanisms.

1. Mathematical Foundation and Standard Formulations

KPConv operates on a set of input points $X = \{ x_i \in \mathbb{R}^d \}_{i=1}^n$ with associated per-point features $F^{in} \in \mathbb{R}^{n \times c_{in}}$ . For each query point $q \in \mathbb{R}^d$ , a radius-based neighborhood $N_q = \{ i : \| X_i - q \| \le r \}$ is identified.

Each KPConv kernel consists of $K$ kernel points $\{ \tilde{x}_k \in \mathbb{R}^d : \| \tilde{x}_k \| \le r \}_{k=1}^K$ , each assigned a learnable weight matrix $W_k \in \mathbb{R}^{c_{in} \times c_{out}}$ . The continuous kernel response is parameterized by the tent function:

$l(y_i, \tilde{x}_k) = \max(0, 1 - \frac{ \| y_i - \tilde{x}_k \| }{ \sigma } )$

with $y_i = X_i - q$ , and $\sigma > 0$ controlling the influence radius. The output feature at $q$ is:

$f^{out}(q) = \sum_{i \in N_q} \left( \sum_{k=1}^K l(y_i, \tilde{x}_k) W_k \right) F^{in}_i$

Kernel points are initialized in regular geometric configurations (e.g. polyhedron vertices), with layer-wise $\sigma$ proportional to grid subsampling cell size. For rigid KPConv, kernel point positions are fixed during training. For deformable KPConv, positions adapt dynamically via learned offsets per query location (Thomas et al., 2019).

2. Rigid, Deformable, and Modern KPConv Variants

Rigid KPConv fixes kernel point locations, offering fast inference and strong geometric priors. Deformable KPConv introduces learnable offsets $\Delta_k(q)$ for each kernel point via a dedicated offset prediction branch, yielding:

$y(q) = \sum_{i \in N_q} \sum_{k=1}^K l( y_i, \tilde{x}_k + \Delta_k(q) ) W_k F^{in}_i$

Regularization terms $L_{fit}$ and $L_{rep}$ are added to guarantee kernel points remain functionally distributed and avoid collapse or drifting into empty regions. Empirical ablations show deformable KPConv sustains higher performance with smaller kernel count or challenging receptive fields (Thomas et al., 2019).

Modern variants include:

KPConvD (Depthwise KPConv) simplifies $W_k$ from matrices to vectors $w_k \in \mathbb{R}^C$ , using channel-wise Hadamard products for greater computational efficiency and lower parameter counts.
KPConvX augments KPConvD with spatial kernel attention, generating modulation vectors $m_k$ per kernel from the query feature via a small MLP. This attention (constrained by sigmoid activations) adapts geometric filtering by local context, improving segmentation and classification accuracy (Thomas et al., 21 May 2024).

3. Embedding Euclidean Symmetries: Frame Averaging and FA-KPConv

While original KPConv is only approximately invariant/equivariant to Euclidean transformations (translation, rotation, reflection), FA-KPConv introduces exact invariance/equivariance via group symmetrization (Alawieh et al., 7 May 2025). For a transformation group $G$ , a function $f$ is $G$ -equivariant if $f(g \cdot X) = g' \cdot f(X)$ for every $g \in G$ , and $G$ -invariant if $g'$ is the identity.

Exact symmetrization over $G$ is computationally intractable for infinite or large groups. Frame Averaging substitutes $G$ by a finite, input-dependent set $F(X)$ constructed from the eigendecomposition of the centered covariance of $X$ :

$c = \frac{1}{n} X^T 1, \quad C = (X - 1 c^T)^T (X - 1 c^T), \quad C = Q \Lambda Q^T$

The frame $F(X)$ contains up to $2^d$ orthogonal matrices (for $d=3$ , $|F|=8$ for $O(3)$ ). The FA-KPConv wrapper executes $|F(X)|$ passes with transformed input and features (and possible output re-transformation for equivariance):

Equivariant: $\hat{f}(X, F^{in}) = \frac{1}{|F(X)|} \sum_{g \in F(X)} g \cdot f( g^{-1} \cdot X, g^{-1} \cdot F^{in} )$
Invariant: $\bar{f}(X, F^{in}) = \frac{1}{|F(X)|} \sum_{g \in F(X)} f( g^{-1} \cdot X, g^{-1} \cdot F^{in} )$

No extra parameters are introduced. The computational overhead is linear in $|F(X)|$ and one eigendecomposition per point cloud. This construction maintains full input information and parameter efficiency (Alawieh et al., 7 May 2025).

4. Implementation, Hyperparameters, and Network Topology

Key implementation details include:

Radius-based neighborhood search using efficient spatial structures (KD-trees, grids).
Subsampling via grid barycentric aggregation to increase receptive fields and control point cloud density.
Strided KPConv leverages subsampled center sets $P'$ for downsampling convolution stages.
Feature stacking follows standard encoder-decoder topologies with skip links and 1x1 convolutions, akin to U-Net architectures.

Hyperparameters:

Number of kernel points $K$ (typically $K=15$ ).
Influence radius $\sigma$ , tuned per layer proportional to grid size ( $\Sigma=1.0$ ).
Deformable convolution radius factor ( $\rho=5.0$ ).
Choice of transformation group $G$ for FA-KPConv invariance/equivariance.
For KPConvD/KPConvX: multi-shell kernel arrangements, number of attention groups $G$ , and modular MLP architectures.

5. Benchmark Results, Ablations, and Performance Analysis

Empirical results indicate:

ModelNet40 (object classification): Rigid KPConv achieves 92.9% OA, deformable KPConv 92.7% OA, both surpassing prior point-based methods (Thomas et al., 2019).
ShapeNetPart (part segmentation): IoU $\approx$ 86.2% (instance-avg).
S3DIS Area 5 (scene segmentation): mIoU 67.1% (deformable), with KPConvX-L reaching 73.5%, outperforming major competitors (Thomas et al., 21 May 2024).
ScanNetv2 (semantic segmentation): KPConvX-L achieves mIoU 76.3% with only 13.5M params.

Ablation studies confirm the benefits of depthwise filtering (KPConvD: memory usage $\downarrow$ by $\sim$ 45%, throughput $\uparrow$ by 2x, negligible accuracy loss), nearest-kernel assignments (more than 2x throughput increase), and kernel attention (minor accuracy gain, increased robustness) (Thomas et al., 21 May 2024).

FA-KPConv analysis shows substantial robustness to test-time rotations and strong sample efficiency, especially in scarce data regimes or challenging registration problems (Alawieh et al., 7 May 2025). Limitation: enforced invariance may marginally restrict model capacity for canonical tasks with large data.

6. Open Questions, Limitations, and Research Directions

Current limitations and research challenges:

Sensitivity to kernel arrangement, radius, and attention grouping. Hyperparameter tuning is non-trivial.
No inherent long-range topological attention; KPConvX is geometrically local. Prospects for hybridization with transformer-based neighbor-wise attention remain open (Thomas et al., 21 May 2024).
Frame averaging overhead scales linearly with $|F(X)|$ ; efficient approximate/hierarchical frames are proposed as future work (Alawieh et al., 7 May 2025).
When test data is already canonical, exact invariance may restrict expressive power (ModelNet40 original test set).
Plateau in accuracy gains above certain kernel counts or shell arrangements.

Future explorations include mixed invariance/equivariance objectives, deformable symmetry-constrained kernels, extension to point cloud segmentation and large-scale scene understanding, and incorporation of topological attention mechanisms.

7. Context and Impact in 3D Point Cloud Learning

KPConv is recognized as a foundational spatial convolution primitive for 3D point cloud analysis, with demonstrated superiority over point-wise MLPs in both efficiency and geometric bias (Thomas et al., 2019). Frame averaging (FA-KPConv) and kernel attention (KPConvX) represent significant methodological extensions, embedding exact Euclidean symmetry and context-driven modulation without increased parameter count.

Across classification, segmentation, and registration tasks, KPConv-based architectures consistently report state-of-the-art results, marking them as central components within modern 3D deep learning pipelines. Ongoing research continues to refine their flexibility, robustness, and computational efficiency (Alawieh et al., 7 May 2025, Thomas et al., 21 May 2024).

PDF Markdown Chat (Pro)

References (3)

KPConv: Flexible and Deformable Convolution for Point Clouds (2019)

KPConvX: Modernizing Kernel Point Convolution with Kernel Attention (2024)

FA-KPConv: Introducing Euclidean Symmetries to KPConv via Frame Averaging (2025)

Follow Topic

Get notified by email when new papers are published related to Kernel Point Convolutions (KPConv).