SC-Net: Two-View Correspondence Network

Updated 5 January 2026

SC-Net is a deep learning architecture for two-view correspondence that integrates bilateral spatial and channel context to produce robust motion field estimates.
It incorporates specialized modules including AFR, BFA, and PAR to enhance spatial localization, inlier robustness, and precise motion vector recovery.
SC-Net's modular design achieves state-of-the-art performance in relative pose estimation and outlier removal, advancing robust 2D/3D matching in complex scenes.

SC-Net is a deep learning architecture for two-view correspondence learning that integrates bilateral context from both spatial and channel dimensions to produce robust, accurate motion fields in relative pose estimation and outlier removal tasks. SC-Net addresses limitations of standard convolutional neural network (CNN) backbones, which may insufficiently aggregate global context and oversmooth dense motion fields, especially in scenes with large disparity. The architecture introduces a sequence of specialized modules, most notably the Adaptive Focused Regularization (AFR) module, a Bilateral Field Adjustment (BFA) module, and a Position-Aware Recovery (PAR) module, each contributing to precise and context-aware motion field estimation (Lin et al., 29 Dec 2025).

1. Architectural Structure and Components

SC-Net comprises a stack of $L$ rectifying layers, each containing three major submodules: AFR, BFA, and PAR. The computational pipeline starts with “unordered” motion features $M^{l-1} \in \mathbb{R}^{N\times C}$ —where $N$ is the number of putative matches and $C$ the feature dimension—and fixed grid embeddings $G \in \mathbb{R}^{K^2\times C}$ , corresponding to a $K\times K$ spatial grid.

Adaptive Focused Regularization (AFR): The initial sub-module within each rectifying layer, AFR transforms $M^{l-1}$ and $G$ into a sparse, position-sensitive motion field $F^l \in \mathbb{R}^{K^2\times C}$ . It implements a multi-head graph attention mechanism, combining position-aware attention and soft filtering to enhance spatial localization and inlier robustness.
Bilateral Field Adjustment (BFA): Refines the motion field $F^l$ by simultaneously modeling interactions across spatial and channel dimensions, capturing long-range dependencies and facilitating cross-context information exchange.
Position-Aware Recovery (PAR): Recovers final motion vectors from the refined field, enforcing consistency and precision through explicit spatial referencing.

This modular structure enables each grid cell to selectively incorporate global and local context, supporting SC-Net’s efficacy in highly variable geometric configurations.

2. Adaptive Focused Regularization Module

The AFR module serves as the core innovation in SC-Net’s correspondence reasoning stack. For each rectifying layer:

Input:
- Motion features $M^{l-1} \in \mathbb{R}^{N\times C}$
- Grid embeddings $G \in \mathbb{R}^{K^2\times C}$
- Normalized keypoint coordinates $X \in \mathbb{R}^{N\times 2}$ , $Y \in \mathbb{R}^{K^2\times 2}$
- Previous-layer inlier logits $\hat z_{cls}^{l-1} \in \mathbb{R}^{N}$
Operation:
- Graph Attention Backbone: Implements multi-head attention with $H$ heads, where each head computes query ( $Q$ ), key ( $K$ ), and value ( $V$ ) projections from $G$ and $M^{l-1}$ .
- Position-Aware Bias: Constructs a positional correlation matrix $S = \Phi_Y \Phi_X^\top \in \mathbb{R}^{K^2\times N}$ , where $\Phi_X$ and $\Phi_Y$ are shared MLP embeddings of $X$ and $Y$ .
- Augmented Attention Logits: Each attention head’s score is modulated by $B_i = \psi(\alpha_i\, S/\sqrt{C} + \beta_i)$ , adding explicit spatial relationships (with learnable $\alpha_i$ , $\beta_i$ and $\psi=$ LeakyReLU).
- Soft Filtering: Values are weighted by inlier probability $p=\sigma(\hat z_{cls})$ via $Ẑ = \mathrm{diag}(p)$ , reducing the influence of outliers.
- Output: Concatenated multi-head outputs yield $F^l = \mathrm{Concat}_i(O_i) W^O$ , ready for further refinement.

This enhances position-awareness and spatial selectivity, directly counteracting the oversmoothing issues present in vanilla GAT and CNN-based approaches.

3. Training Objective and Optimization

SC-Net employs a joint loss on each rectifying layer:

$\mathcal{L} = \sum_{l=0}^{L-1} \left[ \mathcal{L}_{cls}(\hat z_{cls}^l, z_{cls}) + \lambda\,\mathcal{L}_{reg}(\hat E^l, E) \right]$

$\mathcal{L}_{cls}$ : Binary cross-entropy for correspondence classification, with adaptive temperature $\tau$ .
$\mathcal{L}_{reg}$ : Regression loss aligning the predicted essential matrix $\hat E^l$ with ground truth $E$ .
$\lambda$ schedule: Ramps from $0\to 0.5$ after 20k training steps.
Optimization: ADAM optimizer with initial learning rate $10^{-4}$ .
No auxiliary loss is assigned directly to AFR; improvements in loss are observed end-to-end through enhanced correspondence and motion estimation.

4. Hyper-Parameterization

Ablative analysis identifies optimal settings as:

Parameter	Value / Range	Effect
Grid size $K$	16 ( $K^2=256$ cells)	Balances spatial granularity and computational cost
Rectifying layers $L$	6	Empirically optimal vs. $L=4,8$
Attention heads $H$	4	Per-head dimension $C/H$
Position MLP $\mathcal{F}_3$	2-layer, ReLU, out dim $C$	For positional embedding of coordinates
LeakyReLU slope	0.2	Nonlinearity in positional bias

Scaling of $S$ by $1/\sqrt{C}$ and soft filtering via sigmoid are critical tunings for attention stability and inlier weighting.

5. Empirical Results and Ablation

On the YFCC100M dataset (known scenes), SC-Net demonstrates state-of-the-art performance in correspondence classification ( $mAP@5^\circ$ ):

Baseline (ConvMatch, no AFR/BFA): 46.09
+ HED (hierarchical encoder-decoder, BFA only): 57.96
+ MFM (motion-feature modulator): 59.60
+ SF (soft filtering in AFR): 61.96
+ SF + PA (full AFR): 64.35

The data identifies two main sources of improvement in AFR:

Soft Filtering (SF): +2.36 $mAP$ over the prior step.
Position-Aware Attention (PA): Additional +2.39 $mAP$ .

Collectively, these yield a cumulative $>17$ $mAP$ advantage over the unregularized baseline. This highlights the importance of both spatially explicit attention mechanisms and inlier weighting for correspondence robustness.

6. Design Rationale and Context within Correspondence Learning

Graph attention enables flexible information exchange between all matches and spatial locations but, in default form, is prone to spatial mixing and loss of locality. AFR’s position-aware bias enforces correspondence between spatial grid points and candidate matches based on geometric consistency, sharply localizing the attention. The soft filtering mechanism exploits intermediate classifier logits to suppress the effect of spurious, low-confidence motion samples, resulting in a more robust, outlier-tolerant estimate. By stacking these modules, SC-Net produces motion fields that retain high-frequency detail and spatial discontinuity, critical for challenging geometric scenes with large disparity or complex photometric differences.

The BFA and PAR modules further process the output of AFR, modeling joint spatial-channel context and enabling accurate recovery of motion vectors. The end-to-end design supports robust training and generalization without requiring dense supervision beyond keypoint correspondences and essential matrix annotation.

7. Applications and Extensions

SC-Net is benchmarked for relative pose estimation and outlier removal across large-scale correspondence datasets (YFCC100M, SUN3D). Its general design—bilateral spatial/channel context integration with explicit geometric localization—renders it suitable for problems in robotics, SLAM, structure from motion, and robust 2D/3D matching, especially where standard CNN or transformer methods struggle with global context aggregation and spatial discontinuity.

A plausible implication is that the AFR strategy—separately modeling positional and semantic relationships, together with confidence-weighted filtering—may be extensible to broader scene understanding or geometric reasoning tasks beyond correspondence learning, where spatial precision is crucial (Lin et al., 29 Dec 2025).

Markdown Upgrade to Chat

References (1)

SC-Net: Robust Correspondence Learning via Spatial and Cross-Channel Context (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SC-Net.

SC-Net: Two-View Correspondence Network

1. Architectural Structure and Components

2. Adaptive Focused Regularization Module

3. Training Objective and Optimization

4. Hyper-Parameterization

5. Empirical Results and Ablation

6. Design Rationale and Context within Correspondence Learning

7. Applications and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

SC-Net: Two-View Correspondence Network

1. Architectural Structure and Components

2. Adaptive Focused Regularization Module

3. Training Objective and Optimization

4. Hyper-Parameterization

5. Empirical Results and Ablation

6. Design Rationale and Context within Correspondence Learning

7. Applications and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research