Papers
Topics
Authors
Recent
Search
2000 character limit reached

Ada-RefSR: Adaptive Ref Diffusion SR

Updated 9 February 2026
  • The paper introduces Ada-RefSR, which employs Adaptive Implicit Correlation Gating (AICG) to verify and selectively fuse reference cues, mitigating issues from misaligned inputs.
  • It utilizes a single-step diffusion backbone that speeds up inference by over 30× while preserving restoration fidelity through a two-phase 'Trust but Verify' approach.
  • Experimental results show Ada-RefSR outperforms prior methods on benchmarks like CUFED5 and WRSR, achieving higher PSNR, SSIM, and lower LPIPS scores.

Ada-RefSR is a reference-based diffusion super-resolution (RefSR) framework designed to address the challenges of leveraging unreliable or misaligned reference images in real-world image restoration. Guided by a "Trust but Verify" principle, Ada-RefSR adaptively fuses reference information, maximizing the use of useful cues while suppressing misleading content. The system introduces Adaptive Implicit Correlation Gating (AICG), which conditions reference fusion on implicit token-level correlations. Built on a single-step diffusion backbone, Ada-RefSR achieves a favorable combination of fidelity, efficiency, and robustness, outperforming prior explicit matching and global gating solutions in varied RefSR benchmarks (Wang et al., 2 Feb 2026).

1. The "Trust but Verify" Principle in RefSR

RefSR augments the low-quality (LQ) input with a high-resolution reference (Ref), providing guidance for generating visually plausible, high-frequency details. A predominant obstacle in practical settings is the unreliability of LQ–Ref correspondence—degradations, misalignments, and irrelevant retrievals often break semantic and spatial consistency. Excessive dependence on such reference cues introduces artifacts or causes hallucinations; insufficient use negates available valuable information.

Ada-RefSR formalizes a two-phase protocol:

  • Trust: Aggressively inject reference patterns to capture all potentially relevant cues, with an emphasis on recall.
  • Verify: Apply adaptive verification to suppress semantically inconsistent or unreliable contributions, increasing precision.

This dual-phase philosophy is operationalized through architectural and algorithmic innovations centered around reference attention and adaptive gating.

2. Architectural Components and Methodology

2.1 Single-Step Diffusion Backbone

Ada-RefSR leverages a single-step super-resolution diffusion backbone (e.g., S3Diff distilled from Stable Diffusion), freezing its weights and using a single feedforward step for inference. This results in inference speeds over 30× faster than classical multi-step diffusion, while maintaining restoration priors.

2.2 Reference Attention: Trust Phase

Feature representations from the LQ input (Hsrc∈RLsrc×dH_\mathrm{src} \in \mathbb{R}^{L_\mathrm{src} \times d}) and Ref (Href∈RLref×dH_\mathrm{ref} \in \mathbb{R}^{L_\mathrm{ref} \times d}) are projected to queries, keys, and values: Q=HsrcWQ,K=HrefWK,V=HrefWVQ = H_\mathrm{src} W_Q, \quad K = H_\mathrm{ref} W_K, \quad V = H_\mathrm{ref} W_V Vanilla reference attention (RA) is computed as: RA(Hsrc,Href)=ZeroLinear(Softmax(QK⊤d)V)\mathrm{RA}(H_\mathrm{src},H_\mathrm{ref}) = \mathrm{ZeroLinear}\left(\mathrm{Softmax}\left(\frac{QK^\top}{\sqrt{d}}\right)V \right) A residual addition of HsrcH_\mathrm{src} preserves the prior from the diffusion backbone.

2.3 Adaptive Implicit Correlation Gating (AICG): Verify Phase

AICG mitigates erroneous fusion by estimating token-wise reliability:

  • Reference Summarization: Learnable summary tokens TS∈RM×dT_S \in \mathbb{R}^{M \times d} distill major patterns from KK via attention aggregation.
  • Correlation and Gating: Softmax-based attention between QQ and token summaries yields SmapS_\mathrm{map}; per-token gate GG is computed as the average attention over summary tokens, followed by elementwise sigmoid.
  • Gated Fusion: Reference attention output is modulated by GG: Hout=Hsrc+ZeroLinear(G⊙RA(Hsrc,Href))H_\mathrm{out} = H_\mathrm{src} + \mathrm{ZeroLinear}\left(G \odot \mathrm{RA}(H_\mathrm{src},H_\mathrm{ref}) \right) Mismatched regions (low GG) are suppressed; reliable matches (high GG) are reinforced.

An equivalent interpretation treats GG as modulation of key/value contributions within the standard attention update.

3. Training and Optimization

The Ada-RefSR framework fine-tunes only the reference attention and summary tokens, with the diffusion backbone maintained frozen. The objective is a composite of reconstruction, perceptual, and adversarial losses following S3Diff:

Ltotal=λ1Lrec+λ2Lper+λ3Ladv\mathcal{L}_\mathrm{total} = \lambda_1 \mathcal{L}_\mathrm{rec} + \lambda_2 \mathcal{L}_\mathrm{per} + \lambda_3 \mathcal{L}_\mathrm{adv}

  • Lrec\mathcal{L}_\mathrm{rec}: L2L_2 reconstruction error between model output and ground truth.
  • Lper\mathcal{L}_\mathrm{per}: Perceptual loss using VGG feature-space distances.
  • Ladv\mathcal{L}_\mathrm{adv}: Standard GAN loss encouraging naturalism.

Only 62M parameters for RA and 0.2M for TST_S are trained, out of a total 2.68B.

4. Experimental Evaluation

4.1 Datasets

  • Training: Synthetic subsets from DIV2K, DIV8K, Flickr2K (512×512), face RefSR set, and 20% irrelevant reference pairings for robustness.
  • Evaluation: CUFED5, WRSR (scene-level), Bird retrieval (8,460 images, CLIP/DINOv2 retrieval), Face (162 pairs, 40 identities).
  • RealSRGAN degradation model is applied.

4.2 Implementation

  • The backbone is S3Diff (one-step diffusion, frozen).
  • Adam optimizer with learning rate 5×10−55 \times 10^{-5}, batch size 16 for 11k iterations on two NVIDIA A40 GPUs.

4.3 Quantitative Results

Ada-RefSR attains top scores across benchmarks compared to S3Diff and ReFIR*:

Method CUFED5 PSNR↑ CUFED5 SSIM↑ CUFED5 LPIPS↓ WRSR PSNR↑ WRSR SSIM↑ WRSR LPIPS↓
S3Diff 20.46 0.5234 0.3544 21.91 0.5620 0.3542
ReFIR* 20.22 0.5255 0.3452 21.83 0.5673 0.3435
Ada-RefSR* 20.48 0.5461 0.2894 21.97 0.5777 0.3061

Corresponding gains are reported for Bird (PSNR 25.30, SSIM 0.729) and Face (PSNR 27.13, SSIM 0.752, LPIPS 0.175).

4.4 Qualitative and Robustness Analysis

Ada-RefSR produces sharper textures (e.g., bird feathers, logos) with fewer hallucinations or duplicated artifacts than explicit matching methods (e.g., ReFIR). The AICG mechanism enables the system to default gracefully to single-image SR when reference correlation is unreliable due to misalignment or degradations.

5. Ablation and Comparative Studies

Ablation experiments confirm the efficacy of AICG:

Gating Mechanism WRSR PSNR↑ WRSR SSIM↑ Face PSNR↑ Face SSIM↑
Vanilla 21.95 0.5737 27.08 0.7495
Global 21.63 0.5610 27.06 0.7498
ReFIR 21.78 0.5668 26.94 0.7473
AICG (Ours) 21.97 0.5777 27.13 0.7523

Optimal performance was achieved with 16 learnable summary tokens. Ada-RefSR outperforms prior works such as PFStorer (global gating) and ReFIR (explicit similarity), as well as multi-step diffusion pipelines, while being approximately 30× faster than SeeSR+ReFIR at 102421024^2 resolution.

6. Strengths, Limitations, and Future Directions

Strengths

  • Adaptive, implicit gating offers robust protection against both over-utilization and under-utilization of references, outperforming explicit matching and global gating.
  • The single-step diffusion backbone enables real-time super-resolution (0.41 s at 5122512^2) with state-of-the-art fidelity.
  • Robust to severe misalignments; performance degrades gracefully as reference reliability worsens.

Limitations

  • Total model size (~2.7B parameters) exceeds that of pure S3Diff due to reference attention overhead.
  • Gating operates at token granularity; a plausible implication is that finer (patch-wise or hierarchical) gating could further enhance performance.

Future Directions

  • Exploration of patch-wise or hierarchical gating strategies for improved spatial adaptation.
  • Introduction of lightweight adapters, including pruned tokens or sparse attention, to decrease inference costs.
  • Extension of Ada-RefSR to video or multi-reference SR, possibly utilizing temporal consistency constraints.

For further technical details and complete references, see (Wang et al., 2 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Ada-RefSR.