Compact Geometric Features (CGF)
- Compact Geometric Features (CGF) are learned descriptors that encapsulate local 3D geometry from unstructured point clouds using spherical histograms.
- They use a deep multilayer perceptron to map high-dimensional histograms into compact Euclidean spaces, optimized with a triplet loss for discriminative power.
- CGF achieves high precision, compactness, and fast query times compared to traditional hand-crafted descriptors, making it ideal for robotics and 3D vision applications.
Compact Geometric Features (CGF) are learned representations that encapsulate the local geometry around a point in an unstructured point cloud. CGF is designed to facilitate geometric registration tasks central to robotics and 3D vision, where matching and aligning scans from different viewpoints or temporally separated acquisitions is essential. Unlike prior hand-crafted descriptors, CGF achieves precision, compactness, and robustness through a pipeline that maps high-dimensional spherical histograms of point neighborhoods into low-dimensional Euclidean feature spaces using deep neural networks, specifically a multi-layer perceptron (MLP) trained with a triplet embedding loss (Khoury et al., 2017).
1. Local Spherical Histogram Construction
Given an unstructured point cloud and a central point , the method defines a spherical support region of radius . A local reference frame at is estimated by computing the normal (e.g., by principal component analysis on a small neighborhood) and two orthogonal tangent vectors, ensuring a right-handed coordinate system.
Each neighbor is converted to spherical coordinates , where , is elevation, and is azimuth. The spherical region is discretized into:
- radial bins: thresholds are logarithmically spaced between and ,
- elevation bins: uniform, each of extent
- azimuth bins: uniform, each of extent
With these subdivisions, the total number of bins is . A normalized histogram is built for each ,
where indicates the bin assignment, and is the set of points in support (excluding itself). This histogram serves as the high-dimensional representation of the local geometric context.
2. Deep Nonlinear Embedding of Histograms
To achieve compactness, the constructed histogram is mapped to a much lower-dimensional Euclidean space via a learned nonlinear embedding , where typically . The embedding is implemented as a fully-connected MLP with the following architecture:
- Input layer: size
- Five hidden layers: each with 512 ReLU units
- Output layer: size , linear activation
The forward operation is
and the output: Feature dimensionality () is a tunable parameter; in practice, values such as 12, 32, and 64 are used to balance compactness versus discriminative power. The resulting features are denoted CGF-.
3. Triplet Loss Metric Learning
The MLP parameters are optimized using a margin-based triplet loss. Training data consist of triplets , with (anchor) and (positive) being histograms from true correspondences (distance at most under ground-truth alignment) and (negative) from non-correspondences (distance in ). The objective is: where , and are network parameters. Optimization is performed using Adam, learning rate , minibatch size 512, and 3 epochs.
4. Precision, Compactness, and Robustness
The descriptor family achieves high discriminativeness due to the triplet metric learning, compactness (e.g., CGF-32 outperforms descriptors ranging from 33 to 1,980 dimensions), and robustness to real-world challenges such as noise or missing data, owing to training on diverse real and synthetic scans.
Evaluation follows these protocols:
- Precision: Given two overlapping scans and ground-truth transforms , feature-based correspondences are formed by nearest-neighbor search in feature space. Pairs with ground-truth distance exceeding a threshold are discarded. Precision at threshold is given by
with typical values of model diameter (laser scans) or cm (indoor scenes).
- Registration metrics: RMSE of estimated versus ground-truth transformation, and recall (fraction of aligned fragment pairs within a distance threshold).
5. Quantitative Comparison with Hand-Crafted Descriptors
Empirical results demonstrate the superiority of CGF descriptors over several hand-crafted alternatives on independent test sets. Key comparisons:
| Descriptor | Dimensionality | Laser precision@1% | SceneNN precision@10cm | Laser query (ms) | SceneNN query (ms) |
|---|---|---|---|---|---|
| CGF-32 | 32 | 41.4 % | 50.6 % | 0.42 | 0.10 |
| Spin Images | 153 | 32.2 % | 8.2 % | 1.62 | 0.25 |
| FPFH | 33 | 28.1 % | 20.7 % | 0.04 | 0.02 |
| PFH | 125 | 24.5 % | 22.1 % | - | - |
| RoPS | 135 | 23.0 % | 22.7 % | - | - |
| SHOT | 352 | 22.5 % | 20.2 % | - | - |
| USC | 1980 | 21.7 % | 29.8 % | 31.6 | 6.75 |
On the Redwood registration benchmark (no fine-tuning):
| Method | Recall (%) | Precision (%) |
|---|---|---|
| FGR + FPFH | 51.1 | 23.2 |
| CZK + FPFH | 59.2 | 19.6 |
| 3DMatch (volumetric) | 65.1 | 25.2 |
| FGR + CGF-32 | 60.7 | 9.4 |
| CZK + CGF-32 | 72.0 | 14.6 |
CGF achieves higher precision at substantially lower dimensionality and query time than existing descriptors.
6. Implementation and Practicalities
- Histogram resolution: (radial) × (elevation) × (azimuthal);
- Sphere radius: approximately 17% of model diameter for laser data, or 1.2 m for SceneNN; about 1.5% of diameter or 0.1 m
- Local reference frame estimated using support sized at 2% of diameter or 0.25 m
- Feature size (): typically 12–64, with 32 optimal for precision-compactness trade-off
- Embedding network: 5 layers × 512 ReLUs, total inference cost is 5 matrix multiplications plus 5 ReLU applications
- Correspondence search uses -d trees (FLANN) in , incurring query complexity per fragment
7. Limitations and Future Directions
The CGF paradigm requires training on representative overlapping scans with known alignments. Highly regular or repetitive local geometry can present failure cases due to ambiguity in the learned embedding. While performance is robust to common occlusion and noise, extreme or adversarial aliasing remains challenging. Possible future work includes exploring alternative deep metric learning objectives (lifted-structure loss, N-pair loss) or architectural variants such as residual MLPs or attention mechanisms, with the aims of further improving discriminative power and compactness of the learned feature space (Khoury et al., 2017).