Papers
Topics
Authors
Recent
Search
2000 character limit reached

Compact Geometric Features (CGF)

Updated 27 February 2026
  • Compact Geometric Features (CGF) are learned descriptors that encapsulate local 3D geometry from unstructured point clouds using spherical histograms.
  • They use a deep multilayer perceptron to map high-dimensional histograms into compact Euclidean spaces, optimized with a triplet loss for discriminative power.
  • CGF achieves high precision, compactness, and fast query times compared to traditional hand-crafted descriptors, making it ideal for robotics and 3D vision applications.

Compact Geometric Features (CGF) are learned representations that encapsulate the local geometry around a point in an unstructured point cloud. CGF is designed to facilitate geometric registration tasks central to robotics and 3D vision, where matching and aligning scans from different viewpoints or temporally separated acquisitions is essential. Unlike prior hand-crafted descriptors, CGF achieves precision, compactness, and robustness through a pipeline that maps high-dimensional spherical histograms of point neighborhoods into low-dimensional Euclidean feature spaces using deep neural networks, specifically a multi-layer perceptron (MLP) trained with a triplet embedding loss (Khoury et al., 2017).

1. Local Spherical Histogram Construction

Given an unstructured point cloud PR3P\subset\mathbb R^3 and a central point pPp\in P, the method defines a spherical support region S(p)={x:xpr}S(p)=\{x:\|x-p\|\le r\} of radius rr. A local reference frame at pp is estimated by computing the normal npn_p (e.g., by principal component analysis on a small neighborhood) and two orthogonal tangent vectors, ensuring a right-handed coordinate system.

Each neighbor qPS(p)q\in P\cap S(p) is converted to spherical coordinates (ρ,θ,ϕ)(\rho, \theta, \phi), where ρ=qp\rho = \|q-p\|, θ\theta is elevation, and ϕ\phi is azimuth. The spherical region is discretized into:

  • RR radial bins: thresholds are logarithmically spaced between rminr_{\min} and rr,

ri=exp(lnrmin+iRlnrrmin),i=0,,Rr_i = \exp\bigl(\ln r_{\min} + \tfrac{i}{R}\ln\tfrac{r}{r_{\min}}\bigr),\quad i=0,\dots,R

  • EE elevation bins: uniform, each of extent π/E\pi / E
  • AA azimuth bins: uniform, each of extent (2π)/A(2\pi)/A

With these subdivisions, the total number of bins is N=R×E×AN = R \times E \times A. A normalized histogram hpRNh_p\in\mathbb R^N is built for each pp,

[hp]u,v,w=1NpqNpBu,v,w(q)\left[h_p\right]_{u,v,w} = \frac{1}{|N_p|} \sum_{q\in N_p} \mathcal B_{u,v,w}(q)

where Bu,v,w(q)\mathcal B_{u,v,w}(q) indicates the bin assignment, and NpN_p is the set of points in support (excluding pp itself). This histogram serves as the high-dimensional representation of the local geometric context.

2. Deep Nonlinear Embedding of Histograms

To achieve compactness, the constructed histogram hph_p is mapped to a much lower-dimensional Euclidean space RD\mathbb R^D via a learned nonlinear embedding f:RNRDf: \mathbb R^N\to\mathbb R^D, where typically DND\ll N. The embedding is implemented as a fully-connected MLP with the following architecture:

  • Input layer: size NN
  • Five hidden layers: each with 512 ReLU units
  • Output layer: size DD, linear activation

The forward operation is

z(+1)=max(0,W()z()+b()),=0,,4z^{(\ell+1)} = \max(0, W^{(\ell)}z^{(\ell)} + b^{(\ell)}),\quad \ell=0,\dots,4

and the output: f(hp)=W(5)z(5)+b(5)RDf(h_p) = W^{(5)}z^{(5)} + b^{(5)} \in \mathbb R^D Feature dimensionality (DD) is a tunable parameter; in practice, values such as 12, 32, and 64 are used to balance compactness versus discriminative power. The resulting features are denoted CGF-DD.

3. Triplet Loss Metric Learning

The MLP parameters are optimized using a margin-based triplet loss. Training data consist of triplets (xa,xp,xn)(x^a, x^p, x^n), with xax^a (anchor) and xpx^p (positive) being histograms from true correspondences (distance at most τ\tau under ground-truth alignment) and xnx^n (negative) from non-correspondences (distance in [τ,2τ][\tau,2\tau]). The objective is: L(Θ)=1T(xa,xp,xn)T[f(xa;Θ)f(xp;Θ)2f(xa;Θ)f(xn;Θ)2+1]+\mathcal L(\Theta)=\frac{1}{|\mathcal T|}\sum_{(x^a,x^p,x^n)\in\mathcal T} \left[ \|f(x^a;\Theta)-f(x^p;\Theta)\|^2 -\|f(x^a;\Theta)-f(x^n;\Theta)\|^2 +1 \right]_+ where []+=max(0,)[\,\cdot\,]_+ = \max(0, \cdot), and Θ\Theta are network parameters. Optimization is performed using Adam, learning rate 10410^{-4}, minibatch size 512, and 3 epochs.

4. Precision, Compactness, and Robustness

The descriptor family achieves high discriminativeness due to the triplet metric learning, compactness (e.g., CGF-32 outperforms descriptors ranging from 33 to 1,980 dimensions), and robustness to real-world challenges such as noise or missing data, owing to training on diverse real and synthetic scans.

Evaluation follows these protocols:

  • Precision: Given two overlapping scans Pi,PjP_i, P_j and ground-truth transforms Ti,TjT_i, T_j, feature-based correspondences CfC_f are formed by nearest-neighbor search in feature space. Pairs with ground-truth distance exceeding a threshold are discarded. Precision at threshold xx is given by

precisionf(x)={(p,q)Cf:TipTjqx}Cf\mathrm{precision}_f(x) = \frac{|\{(p,q)\in C'_f:\|T_ip-T_jq\|\le x\}|}{|C'_f|}

with typical values x=1%x=1\% of model diameter (laser scans) or x=10x=10 cm (indoor scenes).

  • Registration metrics: RMSE of estimated versus ground-truth transformation, and recall (fraction of aligned fragment pairs within a distance threshold).

5. Quantitative Comparison with Hand-Crafted Descriptors

Empirical results demonstrate the superiority of CGF descriptors over several hand-crafted alternatives on independent test sets. Key comparisons:

Descriptor Dimensionality Laser precision@1% SceneNN precision@10cm Laser query (ms) SceneNN query (ms)
CGF-32 32 41.4 % 50.6 % 0.42 0.10
Spin Images 153 32.2 % 8.2 % 1.62 0.25
FPFH 33 28.1 % 20.7 % 0.04 0.02
PFH 125 24.5 % 22.1 % - -
RoPS 135 23.0 % 22.7 % - -
SHOT 352 22.5 % 20.2 % - -
USC 1980 21.7 % 29.8 % 31.6 6.75

On the Redwood registration benchmark (no fine-tuning):

Method Recall (%) Precision (%)
FGR + FPFH 51.1 23.2
CZK + FPFH 59.2 19.6
3DMatch (volumetric) 65.1 25.2
FGR + CGF-32 60.7 9.4
CZK + CGF-32 72.0 14.6

CGF achieves higher precision at substantially lower dimensionality and query time than existing descriptors.

6. Implementation and Practicalities

  • Histogram resolution: R=17R=17 (radial) × E=11E=11 (elevation) × A=12A=12 (azimuthal); N=2,244N=2,244
  • Sphere radius: approximately 17% of model diameter for laser data, or 1.2 m for SceneNN; rminr_{\min} about 1.5% of diameter or 0.1 m
  • Local reference frame estimated using support sized at 2% of diameter or 0.25 m
  • Feature size (DD): typically 12–64, with 32 optimal for precision-compactness trade-off
  • Embedding network: 5 layers × 512 ReLUs, total inference cost is 5 matrix multiplications plus 5 ReLU applications
  • Correspondence search uses kk-d trees (FLANN) in RD\mathbb R^D, incurring O(nlogn)O(n\log n) query complexity per fragment

7. Limitations and Future Directions

The CGF paradigm requires training on representative overlapping scans with known alignments. Highly regular or repetitive local geometry can present failure cases due to ambiguity in the learned embedding. While performance is robust to common occlusion and noise, extreme or adversarial aliasing remains challenging. Possible future work includes exploring alternative deep metric learning objectives (lifted-structure loss, N-pair loss) or architectural variants such as residual MLPs or attention mechanisms, with the aims of further improving discriminative power and compactness of the learned feature space (Khoury et al., 2017).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Compact Geometric Features (CGF).