Ada-RefSR: Adaptive Ref Diffusion SR
- The paper introduces Ada-RefSR, which employs Adaptive Implicit Correlation Gating (AICG) to verify and selectively fuse reference cues, mitigating issues from misaligned inputs.
- It utilizes a single-step diffusion backbone that speeds up inference by over 30× while preserving restoration fidelity through a two-phase 'Trust but Verify' approach.
- Experimental results show Ada-RefSR outperforms prior methods on benchmarks like CUFED5 and WRSR, achieving higher PSNR, SSIM, and lower LPIPS scores.
Ada-RefSR is a reference-based diffusion super-resolution (RefSR) framework designed to address the challenges of leveraging unreliable or misaligned reference images in real-world image restoration. Guided by a "Trust but Verify" principle, Ada-RefSR adaptively fuses reference information, maximizing the use of useful cues while suppressing misleading content. The system introduces Adaptive Implicit Correlation Gating (AICG), which conditions reference fusion on implicit token-level correlations. Built on a single-step diffusion backbone, Ada-RefSR achieves a favorable combination of fidelity, efficiency, and robustness, outperforming prior explicit matching and global gating solutions in varied RefSR benchmarks (Wang et al., 2 Feb 2026).
1. The "Trust but Verify" Principle in RefSR
RefSR augments the low-quality (LQ) input with a high-resolution reference (Ref), providing guidance for generating visually plausible, high-frequency details. A predominant obstacle in practical settings is the unreliability of LQ–Ref correspondence—degradations, misalignments, and irrelevant retrievals often break semantic and spatial consistency. Excessive dependence on such reference cues introduces artifacts or causes hallucinations; insufficient use negates available valuable information.
Ada-RefSR formalizes a two-phase protocol:
- Trust: Aggressively inject reference patterns to capture all potentially relevant cues, with an emphasis on recall.
- Verify: Apply adaptive verification to suppress semantically inconsistent or unreliable contributions, increasing precision.
This dual-phase philosophy is operationalized through architectural and algorithmic innovations centered around reference attention and adaptive gating.
2. Architectural Components and Methodology
2.1 Single-Step Diffusion Backbone
Ada-RefSR leverages a single-step super-resolution diffusion backbone (e.g., S3Diff distilled from Stable Diffusion), freezing its weights and using a single feedforward step for inference. This results in inference speeds over 30× faster than classical multi-step diffusion, while maintaining restoration priors.
2.2 Reference Attention: Trust Phase
Feature representations from the LQ input () and Ref () are projected to queries, keys, and values: Vanilla reference attention (RA) is computed as: A residual addition of preserves the prior from the diffusion backbone.
2.3 Adaptive Implicit Correlation Gating (AICG): Verify Phase
AICG mitigates erroneous fusion by estimating token-wise reliability:
- Reference Summarization: Learnable summary tokens distill major patterns from via attention aggregation.
- Correlation and Gating: Softmax-based attention between and token summaries yields ; per-token gate is computed as the average attention over summary tokens, followed by elementwise sigmoid.
- Gated Fusion: Reference attention output is modulated by : Mismatched regions (low ) are suppressed; reliable matches (high ) are reinforced.
An equivalent interpretation treats as modulation of key/value contributions within the standard attention update.
3. Training and Optimization
The Ada-RefSR framework fine-tunes only the reference attention and summary tokens, with the diffusion backbone maintained frozen. The objective is a composite of reconstruction, perceptual, and adversarial losses following S3Diff:
- : reconstruction error between model output and ground truth.
- : Perceptual loss using VGG feature-space distances.
- : Standard GAN loss encouraging naturalism.
Only 62M parameters for RA and 0.2M for are trained, out of a total 2.68B.
4. Experimental Evaluation
4.1 Datasets
- Training: Synthetic subsets from DIV2K, DIV8K, Flickr2K (512×512), face RefSR set, and 20% irrelevant reference pairings for robustness.
- Evaluation: CUFED5, WRSR (scene-level), Bird retrieval (8,460 images, CLIP/DINOv2 retrieval), Face (162 pairs, 40 identities).
- RealSRGAN degradation model is applied.
4.2 Implementation
- The backbone is S3Diff (one-step diffusion, frozen).
- Adam optimizer with learning rate , batch size 16 for 11k iterations on two NVIDIA A40 GPUs.
4.3 Quantitative Results
Ada-RefSR attains top scores across benchmarks compared to S3Diff and ReFIR*:
| Method | CUFED5 PSNR↑ | CUFED5 SSIM↑ | CUFED5 LPIPS↓ | WRSR PSNR↑ | WRSR SSIM↑ | WRSR LPIPS↓ |
|---|---|---|---|---|---|---|
| S3Diff | 20.46 | 0.5234 | 0.3544 | 21.91 | 0.5620 | 0.3542 |
| ReFIR* | 20.22 | 0.5255 | 0.3452 | 21.83 | 0.5673 | 0.3435 |
| Ada-RefSR* | 20.48 | 0.5461 | 0.2894 | 21.97 | 0.5777 | 0.3061 |
Corresponding gains are reported for Bird (PSNR 25.30, SSIM 0.729) and Face (PSNR 27.13, SSIM 0.752, LPIPS 0.175).
4.4 Qualitative and Robustness Analysis
Ada-RefSR produces sharper textures (e.g., bird feathers, logos) with fewer hallucinations or duplicated artifacts than explicit matching methods (e.g., ReFIR). The AICG mechanism enables the system to default gracefully to single-image SR when reference correlation is unreliable due to misalignment or degradations.
5. Ablation and Comparative Studies
Ablation experiments confirm the efficacy of AICG:
| Gating Mechanism | WRSR PSNR↑ | WRSR SSIM↑ | Face PSNR↑ | Face SSIM↑ |
|---|---|---|---|---|
| Vanilla | 21.95 | 0.5737 | 27.08 | 0.7495 |
| Global | 21.63 | 0.5610 | 27.06 | 0.7498 |
| ReFIR | 21.78 | 0.5668 | 26.94 | 0.7473 |
| AICG (Ours) | 21.97 | 0.5777 | 27.13 | 0.7523 |
Optimal performance was achieved with 16 learnable summary tokens. Ada-RefSR outperforms prior works such as PFStorer (global gating) and ReFIR (explicit similarity), as well as multi-step diffusion pipelines, while being approximately 30× faster than SeeSR+ReFIR at resolution.
6. Strengths, Limitations, and Future Directions
Strengths
- Adaptive, implicit gating offers robust protection against both over-utilization and under-utilization of references, outperforming explicit matching and global gating.
- The single-step diffusion backbone enables real-time super-resolution (0.41 s at ) with state-of-the-art fidelity.
- Robust to severe misalignments; performance degrades gracefully as reference reliability worsens.
Limitations
- Total model size (~2.7B parameters) exceeds that of pure S3Diff due to reference attention overhead.
- Gating operates at token granularity; a plausible implication is that finer (patch-wise or hierarchical) gating could further enhance performance.
Future Directions
- Exploration of patch-wise or hierarchical gating strategies for improved spatial adaptation.
- Introduction of lightweight adapters, including pruned tokens or sparse attention, to decrease inference costs.
- Extension of Ada-RefSR to video or multi-reference SR, possibly utilizing temporal consistency constraints.
For further technical details and complete references, see (Wang et al., 2 Feb 2026).