Ada-RefSR: Adaptive Ref Diffusion SR

Updated 9 February 2026

The paper introduces Ada-RefSR, which employs Adaptive Implicit Correlation Gating (AICG) to verify and selectively fuse reference cues, mitigating issues from misaligned inputs.
It utilizes a single-step diffusion backbone that speeds up inference by over 30× while preserving restoration fidelity through a two-phase 'Trust but Verify' approach.
Experimental results show Ada-RefSR outperforms prior methods on benchmarks like CUFED5 and WRSR, achieving higher PSNR, SSIM, and lower LPIPS scores.

Ada-RefSR is a reference-based diffusion super-resolution (RefSR) framework designed to address the challenges of leveraging unreliable or misaligned reference images in real-world image restoration. Guided by a "Trust but Verify" principle, Ada-RefSR adaptively fuses reference information, maximizing the use of useful cues while suppressing misleading content. The system introduces Adaptive Implicit Correlation Gating (AICG), which conditions reference fusion on implicit token-level correlations. Built on a single-step diffusion backbone, Ada-RefSR achieves a favorable combination of fidelity, efficiency, and robustness, outperforming prior explicit matching and global gating solutions in varied RefSR benchmarks (Wang et al., 2 Feb 2026).

1. The "Trust but Verify" Principle in RefSR

RefSR augments the low-quality (LQ) input with a high-resolution reference (Ref), providing guidance for generating visually plausible, high-frequency details. A predominant obstacle in practical settings is the unreliability of LQ–Ref correspondence—degradations, misalignments, and irrelevant retrievals often break semantic and spatial consistency. Excessive dependence on such reference cues introduces artifacts or causes hallucinations; insufficient use negates available valuable information.

Ada-RefSR formalizes a two-phase protocol:

Trust: Aggressively inject reference patterns to capture all potentially relevant cues, with an emphasis on recall.
Verify: Apply adaptive verification to suppress semantically inconsistent or unreliable contributions, increasing precision.

This dual-phase philosophy is operationalized through architectural and algorithmic innovations centered around reference attention and adaptive gating.

2. Architectural Components and Methodology

2.1 Single-Step Diffusion Backbone

Ada-RefSR leverages a single-step super-resolution diffusion backbone (e.g., S3Diff distilled from Stable Diffusion), freezing its weights and using a single feedforward step for inference. This results in inference speeds over 30× faster than classical multi-step diffusion, while maintaining restoration priors.

2.2 Reference Attention: Trust Phase

Feature representations from the LQ input ( $H_\mathrm{src} \in \mathbb{R}^{L_\mathrm{src} \times d}$ ) and Ref ( $H_\mathrm{ref} \in \mathbb{R}^{L_\mathrm{ref} \times d}$ ) are projected to queries, keys, and values: $Q = H_\mathrm{src} W_Q, \quad K = H_\mathrm{ref} W_K, \quad V = H_\mathrm{ref} W_V$ Vanilla reference attention (RA) is computed as: $\mathrm{RA}(H_\mathrm{src},H_\mathrm{ref}) = \mathrm{ZeroLinear}\left(\mathrm{Softmax}\left(\frac{QK^\top}{\sqrt{d}}\right)V \right)$ A residual addition of $H_\mathrm{src}$ preserves the prior from the diffusion backbone.

2.3 Adaptive Implicit Correlation Gating (AICG): Verify Phase

AICG mitigates erroneous fusion by estimating token-wise reliability:

Reference Summarization: Learnable summary tokens $T_S \in \mathbb{R}^{M \times d}$ distill major patterns from $K$ via attention aggregation.
Correlation and Gating: Softmax-based attention between $Q$ and token summaries yields $S_\mathrm{map}$ ; per-token gate $G$ is computed as the average attention over summary tokens, followed by elementwise sigmoid.
Gated Fusion: Reference attention output is modulated by $G$ : $H_\mathrm{out} = H_\mathrm{src} + \mathrm{ZeroLinear}\left(G \odot \mathrm{RA}(H_\mathrm{src},H_\mathrm{ref}) \right)$ Mismatched regions (low $G$ ) are suppressed; reliable matches (high $G$ ) are reinforced.

An equivalent interpretation treats $G$ as modulation of key/value contributions within the standard attention update.

3. Training and Optimization

The Ada-RefSR framework fine-tunes only the reference attention and summary tokens, with the diffusion backbone maintained frozen. The objective is a composite of reconstruction, perceptual, and adversarial losses following S3Diff:

$\mathcal{L}_\mathrm{total} = \lambda_1 \mathcal{L}_\mathrm{rec} + \lambda_2 \mathcal{L}_\mathrm{per} + \lambda_3 \mathcal{L}_\mathrm{adv}$

$\mathcal{L}_\mathrm{rec}$ : $L_2$ reconstruction error between model output and ground truth.
$\mathcal{L}_\mathrm{per}$ : Perceptual loss using VGG feature-space distances.
$\mathcal{L}_\mathrm{adv}$ : Standard GAN loss encouraging naturalism.

Only 62M parameters for RA and 0.2M for $T_S$ are trained, out of a total 2.68B.

4. Experimental Evaluation

4.1 Datasets

Training: Synthetic subsets from DIV2K, DIV8K, Flickr2K (512×512), face RefSR set, and 20% irrelevant reference pairings for robustness.
Evaluation: CUFED5, WRSR (scene-level), Bird retrieval (8,460 images, CLIP/DINOv2 retrieval), Face (162 pairs, 40 identities).
RealSRGAN degradation model is applied.

4.2 Implementation

The backbone is S3Diff (one-step diffusion, frozen).
Adam optimizer with learning rate $5 \times 10^{-5}$ , batch size 16 for 11k iterations on two NVIDIA A40 GPUs.

4.3 Quantitative Results

Ada-RefSR attains top scores across benchmarks compared to S3Diff and ReFIR*:

Method	CUFED5 PSNR↑	CUFED5 SSIM↑	CUFED5 LPIPS↓	WRSR PSNR↑	WRSR SSIM↑	WRSR LPIPS↓
S3Diff	20.46	0.5234	0.3544	21.91	0.5620	0.3542
ReFIR*	20.22	0.5255	0.3452	21.83	0.5673	0.3435
Ada-RefSR*	20.48	0.5461	0.2894	21.97	0.5777	0.3061

Corresponding gains are reported for Bird (PSNR 25.30, SSIM 0.729) and Face (PSNR 27.13, SSIM 0.752, LPIPS 0.175).

4.4 Qualitative and Robustness Analysis

Ada-RefSR produces sharper textures (e.g., bird feathers, logos) with fewer hallucinations or duplicated artifacts than explicit matching methods (e.g., ReFIR). The AICG mechanism enables the system to default gracefully to single-image SR when reference correlation is unreliable due to misalignment or degradations.

5. Ablation and Comparative Studies

Ablation experiments confirm the efficacy of AICG:

Gating Mechanism	WRSR PSNR↑	WRSR SSIM↑	Face PSNR↑	Face SSIM↑
Vanilla	21.95	0.5737	27.08	0.7495
Global	21.63	0.5610	27.06	0.7498
ReFIR	21.78	0.5668	26.94	0.7473
AICG (Ours)	21.97	0.5777	27.13	0.7523

Optimal performance was achieved with 16 learnable summary tokens. Ada-RefSR outperforms prior works such as PFStorer (global gating) and ReFIR (explicit similarity), as well as multi-step diffusion pipelines, while being approximately 30× faster than SeeSR+ReFIR at $1024^2$ resolution.

6. Strengths, Limitations, and Future Directions

Strengths

Adaptive, implicit gating offers robust protection against both over-utilization and under-utilization of references, outperforming explicit matching and global gating.
The single-step diffusion backbone enables real-time super-resolution (0.41 s at $512^2$ ) with state-of-the-art fidelity.
Robust to severe misalignments; performance degrades gracefully as reference reliability worsens.

Limitations

Total model size (~2.7B parameters) exceeds that of pure S3Diff due to reference attention overhead.
Gating operates at token granularity; a plausible implication is that finer (patch-wise or hierarchical) gating could further enhance performance.

Future Directions

Exploration of patch-wise or hierarchical gating strategies for improved spatial adaptation.
Introduction of lightweight adapters, including pruned tokens or sparse attention, to decrease inference costs.
Extension of Ada-RefSR to video or multi-reference SR, possibly utilizing temporal consistency constraints.

For further technical details and complete references, see (Wang et al., 2 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Trust but Verify: Adaptive Conditioning for Reference-Based Diffusion Super-Resolution via Implicit Reference Correlation Modeling (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Ada-RefSR.

Ada-RefSR: Adaptive Ref Diffusion SR

1. The "Trust but Verify" Principle in RefSR

2. Architectural Components and Methodology

2.1 Single-Step Diffusion Backbone

2.2 Reference Attention: Trust Phase

2.3 Adaptive Implicit Correlation Gating (AICG): Verify Phase

3. Training and Optimization

4. Experimental Evaluation

4.1 Datasets

4.2 Implementation

4.3 Quantitative Results

4.4 Qualitative and Robustness Analysis

5. Ablation and Comparative Studies

6. Strengths, Limitations, and Future Directions

Strengths

Limitations

Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Ada-RefSR: Adaptive Ref Diffusion SR

1. The "Trust but Verify" Principle in RefSR

2. Architectural Components and Methodology

2.1 Single-Step Diffusion Backbone

2.2 Reference Attention: Trust Phase

2.3 Adaptive Implicit Correlation Gating (AICG): Verify Phase

3. Training and Optimization

4. Experimental Evaluation

4.1 Datasets

4.2 Implementation

4.3 Quantitative Results

4.4 Qualitative and Robustness Analysis

5. Ablation and Comparative Studies

6. Strengths, Limitations, and Future Directions

Strengths

Limitations

Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research