Papers
Topics
Authors
Recent
Search
2000 character limit reached

GuideSR: Dual-Branch Diffusion SR

Updated 18 May 2026
  • GuideSR is a diffusion-based image super-resolution approach that integrates a full-resolution Guidance Branch with a latent diffusion branch to preserve structure and enhance perceptual quality.
  • It employs a dual-branch design where the Guidance Branch retains fine structural details and the Diffusion Branch, enhanced via LoRA tuning, boosts global perceptual metrics.
  • GuideSR achieves state-of-the-art performance on multiple benchmarks while offering efficient single-step inference for real-time image restoration.

GuideSR is a single-step diffusion-based image super-resolution (SR) architecture designed for high-fidelity restoration of degraded inputs. Unlike prior methods that condition restoration on variational autoencoder (VAE) encodings—often at the cost of structural fidelity—GuideSR introduces a novel dual-branch system: a full-resolution Guidance Branch dedicated to structure preservation, and a Diffusion Branch leveraging a pretrained latent diffusion model to enhance perceptual quality. GuideSR demonstrates state-of-the-art performance across multiple SR benchmarks, combining fidelity and efficiency through innovative architectural and training strategies (Arora et al., 1 May 2025).

1. Motivation and Limitations of Prior Single-Step Diffusion SR

Image super-resolution (SR) targets the estimation of a high-resolution (HR) image YY from a degraded low-resolution (LR) input II. Diffusion-based SR methods, such as SR3, StableSR, and DiffBIR, have established diffusion priors as powerful tools for this task, but their requirement for tens to hundreds of denoising steps renders them impractical for real-time applications. Recent single-step approaches—including SinSR and OSEDiff—compress this process to a single UNet denoiser pass by conditioning on a VAE-encoded latent of the LR image. This strategy introduces limitations: aggressive VAE downsampling (e.g., 8×8\times) erases fine structural content, as the VAE was trained on high-quality data and is ill-equipped for highly degraded sources. Consequently, these models tend to hallucinate textures at the expense of image-specific detail. GuideSR addresses these deficits by operating at full spatial resolution during guidance and fusing this structural information directly into the restoration pipeline.

2. Architecture Overview

GuideSR adopts a dual-branch paradigm, with both branches trained jointly and supervised by a shared adversarial discriminator, but only the Diffusion Branch output is used during inference. The two branches are:

  • Guidance Branch: Maintains full spatial resolution to retain structural fidelity.
  • Diffusion Branch: Utilizes a latent-space diffusion model to boost perceptual metrics.

Diagrammatic representation (as described in the source):

Branch Domain Key Components
Guidance Branch Pixel-space FRBs, Channel Attention, IGN, PixelUnshuffle
Diffusion Branch VAE Latent Space Stable Diffusion Turbo v2.1 VAE, Prompt Extractor, UNet, LoRA finetuning

This division enables the model to harness both detailed structure from the original LR input and global, perceptually plausible enhancements from diffusion modeling.

3. Architectural Details

3.1 Guidance Branch

Input: IRH×W×3I\in\mathbb{R}^{H\times W\times3}

  • Feature Extraction: F0=Conv3C(I)F_0 = \operatorname{Conv}_{3\rightarrow C}(I)
  • Deep Encoding: Sequential Full Resolution Blocks (FRBs) with residual-in-residual design and channel attention,

Fd=FRGNet(F0)=FRBnFRB1(F0)F_d = \operatorname{FRGNet}(F_0) = \operatorname{FRB}_n\circ\cdots\circ\operatorname{FRB}_1(F_0)

Each FRB:

FRB(X)=X+ConvMM(Xa(X)),a(X)=σ(W2GELU(W1GAP(X)))\operatorname{FRB}(X) = X + \operatorname{Conv}_{M\to M}(X\odot a(X)), \quad a(X) = \sigma\Bigl(W_2\,\operatorname{GELU}(W_1\,\operatorname{GAP}(X))\Bigr)

  • Image Guidance Network (IGN): Guided attention mechanism,

A=σ(Conv2C2C(Fd)),G=Conv2C2C(Fd),Fr=Fd+AGA = \sigma(\operatorname{Conv}_{2C\to 2C}(F_d)), \quad G = \operatorname{Conv}_{2C\to 2C}(F_d), \quad F_r = F_d + A \odot G

Output residual image:

R2=Conv2C3(Fr)+IR_2 = \operatorname{Conv}_{2C\to3}(F_r) + I

  • Cross-Scale Feature Fusion: Downsampling via pixel-unshuffle to create multi-scale structural features:

Fr=PixelUnshuffle(Fr,s)F'_r = \operatorname{PixelUnshuffle}(F_r, s)

These are concatenated into the UNet encoder of the Diffusion Branch.

3.2 Diffusion Branch

Input: II0 again

  • VAE Encoding: II1, yielding latent of shape II2
  • Prompt Features: II3
  • Single-Step UNet Denoising: Finetuned with LoRA adapters at a fixed timestep II4, with cross-scale features from the Guidance Branch,

II5

  • Long-Skip Residual: II6
  • VAE Decoding: II7

LoRA adapters are applied for parameter-efficient tuning of all UNet and VAE weights.

4. Training Strategy and Loss Functions

Both branches output predictions (II8 from Diffusion, II9 from Guidance) and are jointly supervised by a shared discriminator 8×8\times0:

  • Restoration Loss per Branch:

8×8\times1

with 8×8\times2. Terms include: - Mean Squared Error (MSE) - Learned Perceptual Image Patch Similarity (LPIPS) - Adversarial (GAN) loss: 8×8\times3

  • Final Loss Function:

8×8\times4

with 8×8\times5.

5. Inference Procedure

GuideSR performs inference in a single pass, as outlined in the provided pseudocode:

8×8\times7

Inference requires a single UNet and accompanying VAE encode/decode operation, returning only the Diffusion Branch output 8×8\times6.

6. Empirical Performance

GuideSR is evaluated on widely used synthetic (DIV2K-Val) and real-world (DRealSR, RealSR) datasets, with all inference performed in a single UNet step. Comparative quantitative results are summarized as follows:

Dataset Method Steps PSNR SSIM↑ LPIPS↓ DISTS↓ FID
DIV2K ResShift 15 24.65 0.6181 0.3349 0.2213 36.11
OSEDiff 1 23.72 0.6108 0.2941 0.1976 26.32
GuideSR 1 24.76 0.6333 0.2653 0.1879 21.04
DRealSR ResShift 15 28.46 0.7673 0.4006 0.2656 172.26
OSEDiff 1 27.92 0.7835 0.2968 0.2165 135.30
GuideSR 1 29.85 0.8078 0.2640 0.1960 122.06
RealSR ResShift 15 26.31 0.7421 0.3460 0.2498 141.71
OSEDiff 1 25.15 0.7341 0.2921 0.2128 123.49
GuideSR 1 27.08 0.7681 0.2407 0.1878 96.83

Key empirical findings include:

  • On DRealSR, GuideSR exceeds the best single-step and multi-step baselines by up to 1.39 dB in PSNR.
  • Substantial FID reduction versus OSEDiff: 13.24 on DRealSR, 26.66 on RealSR.
  • Consistent improvements across SSIM, LPIPS, and DISTS.
  • Qualitative analysis reveals preservation of fine textures, such as text detail, reflective surfaces, and geometric elements, which competing methods blur or inaccurately hallucinate.

7. Computational Efficiency

Compared to traditional multi-step diffusion methods (e.g., StableSR at 200 steps or DiffBIR at 50 steps), which demand extensive UNet runtimes (seconds to minutes per image on A100 GPUs), GuideSR achieves real-time inference. The addition of the Guidance Branch (typically 8–12 FRBs plus IGN) increases floating-point operations by approximately 10% compared to existing single-step frameworks such as OSEDiff. In practical terms, GuideSR achieves end-to-end inference in approximately 0.3–0.5 s per image on A100 hardware, preserving real-time computational feasibility while advancing restoration fidelity.

8. Significance and Practical Implications

GuideSR advances the state of the art in diffusion-based super-resolution by directly addressing the structural fidelity limitations of prior single-step models. The integration of a dedicated full-resolution Guidance Branch with efficient cross-branch fusion ensures both image faithfulness and perceptual enhancement. Achieving improvements across both reference-based pixel metrics (PSNR, SSIM) and perceptual feature distances (LPIPS, DISTS, FID), GuideSR provides a practical, computationally efficient SR solution suitable for real-world image restoration scenarios (Arora et al., 1 May 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GuideSR.