SuperGlue: Deep Learning Feature Matching

Updated 29 January 2026

SuperGlue is a deep learning feature matching framework that uses optimal transport and graph neural network inference for robust two-view correspondence.
It employs joint attention-based context aggregation with Sinkhorn iterations to compute sparse, one-to-one assignments between image features in real-time.
Empirical evaluations demonstrate that SuperGlue outperforms traditional methods, achieving higher precision and robust performance across various domains.

SuperGlue is a deep learning-based feature matching framework designed to solve the two-view correspondence problem with differentiable optimal transport and graph neural network inference. Originally introduced for image matching and pose estimation, SuperGlue is now recognized for its impact on computer vision and remote sensing applications due to its robustness to appearance changes, geometric distortions, and multimodal sensor data. The architecture is characterized by joint attention-based context aggregation and end-to-end learned assignment, substantially outperforming hand-engineered and shallow-learned matching heuristics under challenging conditions (Sarlin et al., 2019).

1. Problem Formulation and Theoretical Principles

SuperGlue tackles the problem of matching two sets of local features (keypoints and descriptors) from images A and B. Let image A have N keypoints $\{x_i = (p_i, d_i)\}_{i=1}^N$ and image B have M keypoints $\{y_j = (q_j, e_j)\}_{j=1}^M$ , where $p_i, q_j \in \mathbb{R}^2$ are locations and $d_i, e_j \in \mathbb{R}^D$ are feature descriptors. The goal is to compute a partial, one-to-one assignment matrix $P \in [0,1]^{N\times M}$ such that $P_{ij}$ encodes the confidence that $x_i$ matches $y_j$ .

The assignment is constructed by solving the entropy-regularized optimal transport problem:

$P^* = \underset{P \geq 0}{\arg\min} \sum_{ij} P_{ij} C_{ij} - \epsilon H(P)$

subject to row and column sum constraints, with $H(P)$ the entropy of $P$ . To allow for unmatched keypoints ("occlusions"), the assignment is augmented with dustbin nodes, and Sinkhorn iterations are used to approximate the solution in a differentiable way (Sarlin et al., 2019).

2. Graph Neural Network Architecture and Context Aggregation

SuperGlue utilizes a multi-layer attentional graph neural network to aggregate context over keypoints. The architecture alternates between self-attention layers (intra-image context propagation) and cross-attention layers (inter-image matching cues):

Each keypoint is first encoded by concatenating its descriptor and positional encoding via an MLP.
Within each layer, for each node, attention is computed over its neighbors:

$\alpha_{ij} = \mathrm{softmax}_j \left(\frac{q_i^\top k_j}{\sqrt{d}}\right), \quad m_i = \sum_j \alpha_{ij} v_j$

where $q_i, k_j, v_j$ are query, key, and value projections of the node features.

The output representation is updated via residual connection and MLP mixing.
L layers are stacked; multi-head attention is employed for expressiveness (Sarlin et al., 2019).

After L layers, final descriptors $f^A, f^B$ are produced for both images.

3. Matching, Optimal Assignment, and Loss

The refined descriptors yield a score matrix $S_{ij} = (f^A_i)^\top f^B_j$ , which is augmented with a dustbin row and column, and converted to a cost matrix for optimal transport. Sinkhorn iterations produce a doubly-stochastic assignment matrix $\bar P$ . The training objective is a negative log-likelihood loss over ground-truth correspondences, with penalties on both matchable and unmatched keypoints:

$L = -\sum_{(i,j) \in M} \log \bar P_{ij} - \sum_{i \in I} \log \bar P_{i,M+1} - \sum_{j \in J} \log \bar P_{N+1,j}$

where $(i,j) \in M$ are true matches, $I$ and $J$ are unmatched keypoints (Sarlin et al., 2019).

4. Empirical Performance and Comparative Evaluation

A series of studies demonstrate SuperGlue's superior accuracy and robustness across domains:

Task / Domain	Precision	RMS Error	Success Rate	Notes
Homography (synthetic images)	90.7%	3px	-	Outperforms OANet, high AUC for direct DLT estimator (Sarlin et al., 2019)
Indoor Pose (ScanNet)	84.4%	-	51.8% (AUC@20°)	Best among PointCN, OANet, SIFT (Sarlin et al., 2019)
Lunar Image Registration	-	0.62 px (equator)	100% (polar)	Outperforms SIFT, RIFT2, handles geometric/radiometric distortion (Makharia et al., 5 Sep 2025)
Multi-date Satellite Stereo	-	1.18 m (height RMSE)	86%	Outperforms SIFT, LightGlue, detector-free methods for epipolar geometry (Song et al., 2024)

SuperGlue achieves real-time inference on standard GPUs (≈70ms/pair for 512 keypoints) and far higher pose estimation and registration reliability under challenging lighting or sensor conditions compared to classical matchers (Sarlin et al., 2019, Makharia et al., 5 Sep 2025, Song et al., 2024).

5. Implementation Regimes and Preprocessing Pipelines

SuperGlue is typically deployed atop learned keypoint detectors such as SuperPoint, requiring pre-extraction of local features. Domain-appropriate preprocessing (e.g., contrast-limited adaptive histogram equalization, PCA enhancement for lunar images, sub-pixel Least Squares Matching for satellite tie-points) is strongly recommended for cross-sensor registration (Makharia et al., 5 Sep 2025, Song et al., 2024). Default network hyperparameters (9 GNN layers, Sinkhorn temperature ≈ 0.1, dustbin confidence threshold 0.2) generalize well to remote sensing and scene understanding tasks.

6. Strengths, Limitations, and Extensions

SuperGlue’s core advantages are:

Learned geometric and photometric priors yield high match precision and robust outlier rejection.
Sinkhorn-based assignment enforces global one-to-one constraints and soft occlusion handling.
Attention-based global context makes it robust to appearance change and geometric transformation.

Limitations include the requirement for a strong keypoint detector in low-contrast regions and relatively high compute/memory footprint versus classic algorithms. It presently focuses on two-view matching; multi-view graph matching and deeper integration with downstream pose/SLAM remain avenues for extension (Sarlin et al., 2019).

7. Practical Recommendations and Best Practices

Use hybrid pipelines: fall back on SuperGlue when SIFT/AKAZE fail due to radiometry or geometry issues (Song et al., 2024).
Always apply sub-pixel refinement (e.g., LSM) and robust relative orientation (RANSAC, RPC correction) downstream of SuperGlue matches (Makharia et al., 5 Sep 2025).
No domain-specific re-training is needed in many cases, but for further improvement under novel sensors, retraining on relevant modalities is suggested.

In summary, SuperGlue represents a state-of-the-art solution for two-view feature matching, demonstrating adaptability and precision across natural scene, satellite, and cross-modality remote sensing data. Its graph neural network architecture and optimal transport-based assignment push the limits of learning-based matching under challenging real-world conditions (Sarlin et al., 2019, Makharia et al., 5 Sep 2025, Song et al., 2024).

Markdown Upgrade to Chat

References (3)

SuperGlue: Learning Feature Matching with Graph Neural Networks (2019)

Comparative Evaluation of Traditional and Deep Learning Feature Matching Algorithms using Chandrayaan-2 Lunar Data (2025)

Deep Learning Meets Satellite Images -- An Evaluation on Handcrafted and Learning-based Features for Multi-date Satellite Stereo Images (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SuperGlue.

SuperGlue: Deep Learning Feature Matching

1. Problem Formulation and Theoretical Principles

2. Graph Neural Network Architecture and Context Aggregation

3. Matching, Optimal Assignment, and Loss

4. Empirical Performance and Comparative Evaluation

5. Implementation Regimes and Preprocessing Pipelines

6. Strengths, Limitations, and Extensions

7. Practical Recommendations and Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

SuperGlue: Deep Learning Feature Matching

1. Problem Formulation and Theoretical Principles

2. Graph Neural Network Architecture and Context Aggregation

3. Matching, Optimal Assignment, and Loss

4. Empirical Performance and Comparative Evaluation

5. Implementation Regimes and Preprocessing Pipelines

6. Strengths, Limitations, and Extensions

7. Practical Recommendations and Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research