Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 82 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 14 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 117 tok/s Pro
Kimi K2 200 tok/s Pro
GPT OSS 120B 469 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

ALIKED Detector: Sparse Keypoint Extraction

Updated 2 September 2025
  • ALIKED Detector is a deep neural architecture that uses a Sparse Deformable Descriptor Head to extract keypoints and descriptors with geometric invariance.
  • It integrates a differentiable keypoint detection module and sparse NRE loss to reduce computation while ensuring high performance in visual tasks such as image matching and 3D reconstruction.
  • Benchmarked at over 125 FPS with near-perfect accuracy in visual relocalization, the method is especially suited for real-time and resource-constrained applications.

The ALIKED detector is a deep neural architecture for joint extraction of image keypoints and descriptors, designed to provide computationally efficient and geometrically invariant feature representations for visual measurement tasks. By leveraging a Sparse Deformable Descriptor Head (SDDH) and relaxing traditional dense loss formulations to a sparse regime, ALIKED advances the state of the art in image matching, 3D reconstruction, and visual relocalization, explicitly focusing on efficiency while maintaining high expressiveness.

1. Architectural Overview and Motivation

ALIKED aims to resolve the inefficiencies of dense descriptor extraction, which are prevalent in conventional methods such as SIFT, SuperPoint, and dense CNN-based approaches. The detector fuses a Differentiable Keypoint Detection (DKD) module with the SDDH, confining descriptor processing to detected keypoints rather than entire image grids.

Key design objectives include:

  • Efficient extraction of keypoints and descriptors suitable for real-time and resource-constrained applications, such as SLAM and mobile visual localization.
  • Enabling geometric invariance in descriptors by adaptively sampling supporting features with learned, deformable offsets.

This focus on sparse extraction contrasts with older methods that generate dense score maps and descriptor maps for each pixel, allowing resource-efficient computation without sacrificing performance.

2. Sparse Deformable Descriptor Head (SDDH)

The SDDH is a fundamental innovation of ALIKED, designed to model freeform geometric changes through learned sampling positions. For each detected keypoint, descriptor extraction is performed on a local patch (e.g., 5×5 pixels) using the following procedure:

  • M deformable sample positions are predicted via a neural subnetwork:

ps=conv1×1(SELU(convK×K(FK×K)))p^s = \text{conv}_{1\times 1}\big(\text{SELU}(\text{conv}_{K\times K}(F_{K\times K}))\big)

where psRM×2p^s \in \mathbb{R}^{M \times 2} encodes the offsets for bilateral sampling.

  • Bilinear sampling is conducted on the global feature map at these offsets, followed by encoding with Φ()\Phi(\cdot) and aggregation using learned weights wMw_M:

d=i=1MwM(pi)Φ(F(p+pis))d = \sum_{i=1}^{M} w_M(p_i) \cdot \Phi(F(p + p_i^s))

  • The number of locations M is tunable, decoupled from the fixed K×KK\times K convolution grid, which enhances the flexibility to model non-affine deformations.

This patch-sparse, position-adaptive descriptor formulation increases the expressiveness of descriptors while minimizing redundant computation.

3. Descriptor Extraction Mechanism

ALIKED extracts robust descriptors only at sparse, salient keypoint locations, diverging from the dense sampling paradigm. This enables:

  • Computational savings, as the network does not process all pixel positions but only those proximal to keypoints.
  • Increased network capacity per descriptor, facilitating stronger geometric invariance and discriminative power.
  • High frame rates and lower GFLOPs consumption, benchmarking at over 125 FPS for 640×480 images with 1,000 keypoints for the ALIKED-T(16) variant.

This approach enables a targeted, high-throughput feature pipeline with minimal loss in precision.

4. Sparse Neural Reprojection Error (NRE) Loss

Traditional dense loss formulations, such as the Neural Reprojection Error (NRE), require matching likelihoods for all pixels, impeding efficiency. ALIKED introduces a sparse adaptation:

  • For a keypoint descriptor dAd_A in image A and a set of descriptors DBD_B in image B, compute the similarity vector:

sim(dA,DB)=DBdA\text{sim}(d_A, D_B) = D_B \cdot d_A

  • Compute the matching probability vector via softmax with temperature tdest_{des}:

qm(dA,DB)=softmax(sim(dA,DB)1tdes)q_m(d_A, D_B) = \text{softmax}\left(\frac{\text{sim}(d_A, D_B) - 1}{t_{des}}\right)

  • The sparse NRE loss for a keypoint pAp_A is:

Lds(pA,IB)=ln(qm(dA,dB))\mathcal{L}_{ds}(p_A, I_B) = -\ln\left(q_m(d_A, d_B)\right)

where dBd_B is the descriptor at the matched keypoint in image B.

This relaxation to sparse probability vectors reduces memory demand and focuses optimization on keypoint-centric representations.

5. Quantitative Performance Evaluation

ALIKED exhibits competitive or superior results across visual measurement benchmarks:

Task/Benchmark Model/Variant Metric(s) Performance
Hpatches (Homography) ALIKED-T(16) MHA@3 78.70%
IMW (3D Reconstruction) ALIKED-T/N variants MS, repeatability Higher than DISK, ASLFeat
Aachen Day-Night (Reloc.) ALIKED-N(32) Accuracy at 0.25m, 2° Nearly 100% correct matches

High frame-rate (>125 FPS) and low computational overhead (GFLOPs) further substantiate its suitability for real-time applications.

6. Mathematical Formulation of Core Operations

The ALIKED detector is grounded in several key mathematical formulations:

  • Deformable Transformation:

[x,y]T=[x,y]T+[Δx,Δy]T[x', y']^T = [x, y]^T + [\Delta x, \Delta y]^T

  • DCN-style deformable convolution (for context):

F(p)=i=1K2w(pi)F(p+pi+Δpi)F'(p) = \sum_{i=1}^{K^2} w(p_i) \cdot F(p + p_i + \Delta p_i)

  • Learned deformable position offsets, encoding function, and aggregation (see Section 2 above).
  • Sparse NRE loss and softmax-based similarity metrics (see Section 4 above).

These formulations delineate both the feature extraction and the loss-driven optimization processes central to ALIKED.

7. Outlook and Future Development

The paper proposes several future directions:

  • Integration of joint training for keypoint detection and descriptor extraction in a unified pipeline.
  • Optimization for deployment on embedded or mobile platforms, potentially requiring quantization of descriptors.
  • Enhancement of deformable descriptor extraction to handle greater variances in viewpoint and scale, suggesting multi-layer or deeper sampling architectures.

These advances target further improvements in geometric invariance, memory efficiency, and hardware compatibility, with application scope extending to all tasks demanding robust, real-time feature extraction.


In summary, ALIKED introduces sparse keypoint-based descriptor extraction via deformable sampling, relaxation of dense loss formulations, and comprehensive efficiency benchmarks, establishing a foundation for high-performance, resource-efficient visual measurement pipelines (Zhao et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube