Zero-Shot Super-Resolution

Updated 25 August 2025

Zero-shot super-resolution is a paradigm that uses test-time adaptation and internal self-similarity to enhance low-resolution data without external training pairs.
The approach employs methods like deep internal learning, self-supervision, and modular external priors to generate high-quality outputs from arbitrary degradations.
It is applied across domains such as medical imaging, scientific simulation, and graphics, overcoming limitations inherent to traditional supervised frameworks.

Zero-shot super-resolution is a paradigm in which a model, without access to external paired examples or prior training on the specific data distribution, performs inference-time learning to recover high-resolution (HR) content from a given low-resolution (LR) observation. This approach stands in contrast to conventional supervised super-resolution pipelines, which are typically trained on large datasets of registered LR–HR pairs and thus are often rigid, sensitive to mismatches in real-world acquisition, and unable to generalize to arbitrary degradations or domains. Zero-shot super-resolution leverages either the internal recurrence of information within a single data instance (such as an image or point cloud), self-supervision across scales, or external priors incorporated in a modular, inference-time fashion. The domain has rapidly expanded from its inception in single-image super-resolution to encompass 3D data, medical and scientific imaging, omnidirectional imagery, text- and language-guided exploration, diffusion and consistency models, neural operators for physical fields, and mesh textures, among others.

1. Fundamental Principles and Historical Development

The essential principle of zero-shot super-resolution is test-time, data-instance-specific adaptation, without recourse to external LR–HR datasets or assumptions about a fixed degradation model. The canonical method, introduced in "Zero-Shot" Super-Resolution using Deep Internal Learning (Shocher et al., 2017), involves extracting internal statistics—such as recurring patches or features at multiple scales—directly from the test image. A small convolutional neural network (CNN) is trained from scratch at inference, using as its "training data" the downscaled versions of the test image itself. This paradigm leverages the low entropy and rich self-similarity of natural images at multiple scales to efficiently learn a mapping from LR to HR for that particular instance.

Subsequent work—such as Meta-Transfer Learning for Zero-Shot Super-Resolution (MZSR) (Soh et al., 2020)—extends these ideas by finding meta-learned initializations from external data that allow for rapid adaptation to new images and degradations, reducing the time required for inference-time optimization. Methods like RZSR (Yoo et al., 2022) and various internal learning approaches for 3D data and scientific domains generalize the zero-shot notion to cases where internal instance structure can be further augmented with internal reference patches or physics-based features.

2. Methodological Frameworks

Zero-shot super-resolution methods can be organized along the following methodological axes:

A. Deep Internal Learning

Instance-specific CNNs are trained at inference on patch pairs, exploiting self-similarity (e.g., ZSSR (Shocher et al., 2017)).
Extensions include meta-initialization for rapid convergence (MZSR (Soh et al., 2020)) and reference-based zero-shot SR (RZSR (Yoo et al., 2022)) using cross-scale patch retrieval and depth guidance.

B. Self-Supervision through Data Manipulation

Random sub-sampling and super-resolution frameworks allow self-supervision from only one data instance (Noise2SR (Tian et al., 20 Jun 2024) for microscopy denoising/SR, ZSPU (Zhou et al., 2021) for 3D point clouds).
Super-resolution is formulated as an inverse problem, with synthetic LR–HR pairs generated from within the same datum.

C. Physics- and Domain-informed Models

For scientific images, domain knowledge is encoded via differentiable forward and backward operators (e.g., SADIR-Net for CT (Zhang et al., 2020), EC-SRGAN for turbulence (Wu et al., 22 Jul 2024), TNOs for urban micrometeorology (Yasuda et al., 30 Apr 2025)).

D. Modular Plug-in of External Priors

Strong external priors can be integrated in a "plug-and-play" fashion. For instance, in zero-shot CT SR using 2D X-ray priors (Noh et al., 21 Aug 2025), diffusion models trained on 2D X-rays provide HR projections that guide 3D Gaussian splatting reconstruction in a fully unsupervised setting.

E. Diffusion and Consistency Models

Recent works leverage pretrained diffusion or consistency models as generic image priors, with task-specific guidance (e.g., DDNM, consistency models (Garber et al., 29 Dec 2024), or text-guided methods (Gandikota et al., 2 Mar 2024)). Back-projection, pseudoinverse, or CLIP guidance modules directly enforce data and semantic consistency.

F. Neural Operators and Resolution-Invariant Models

Operator learning frameworks (e.g., FNOs, DFU (Havrilla et al., 2023), TNOs) encode mapping in function spaces, supporting grid-invariant, zero-shot super-resolution—even with unstructured data (Yasuda et al., 30 Apr 2025) or for scientific weather downscaling (Sinha et al., 21 Sep 2024).

3. Internal Recurrence and Adaptation to Degradations

A distinguishing characteristic is the capacity for per-instance adaptation to previously unseen degradations. Since the network is either trained de novo or fine-tuned at test time, its receptive field, parameterization, and even loss can flexibly adapt to degradation processes such as non-ideal blur, sensor-specific noise, or domain variation. This circumvents the main limitation of supervised SR: generalization only to degradations present in the training corpus.

Examples include:

ZSSR (Shocher et al., 2017): Adapting to unknown blur, noise, and artifacts by training only on synthetic variants derived from the input.
Endomicroscopy ZSSR (Szczotka et al., 2021): Modeling irregular fiber-based acquisition physics via Voronoi downsampling kernels and simulating realistic non-Gaussian, multiplicative and additive noise.
SADIR-Net (Zhang et al., 2020): Incorporating CT physics via both sinogram-domain SR and image-domain deblurring, unrolled as a deep network.

When external, cross-domain priors are incorporated (as in zero-shot CT with 2D X-ray priors (Noh et al., 21 Aug 2025)), they are dynamically fused at inference via mechanisms such as per-projection adaptive sampling and negative alpha blending (to enable residual learning), which target high-frequency details lost in internally trained zero-shot regimes.

4. Mathematical and Algorithmic Formulations

Zero-shot super-resolution is formalized as an inverse problem, generally expressed as: $\min_\theta \; \mathcal{L}(F(I_\text{LR};\theta), I_\text{HR}),$ where $\theta$ is optimized solely using internal data (e.g., patch pairs from $I_\text{LR}$ ), or with self-supervised losses (e.g., Noise2SR (Tian et al., 20 Jun 2024)): $\min_\theta \mathbb{E} \left[ \| f_\theta(y_{J^c})_J - y_J \|^2 \right].$

Advanced algorithmic strategies include:

Back-projection guidance for consistency models (Garber et al., 29 Dec 2024):

$\ell_{BP}(x; y) = \frac{1}{2}\| (AA^\top)^{-1/2} (Ax - y) \|_2^2; \quad \nabla_x \ell_{BP}(x; y) = A^\dagger (Ax - y)$

Per-projection adaptive sampling (PAS) in diffusion-guided CT SR (Noh et al., 21 Aug 2025): $t_{\mathrm{start}} = \max \left\{ t \mid \Delta_t = \| \hat{x}_{0|t} - y^{\mathrm{up}} \|_2 \leq \tau_{\mathrm{thr}} \right\}$
Residual learning in Gaussian Splatting (NAB-GS) (Noh et al., 21 Aug 2025): $\rho = \phi^{-1}(\hat{y}(x)) \quad \text{for} \; \phi(x) = \begin{cases} x & x \geq 0 \ \gamma x & x < 0 \end{cases}$
Dual convolution operators in scale-robust diffusion (DFU (Havrilla et al., 2023)): $\text{DC}(f) = (\text{spatial conv}) * f + \mathcal{F}^{-1}(K \odot \mathcal{F}(f))$

Each instance is tailored to the data, domain, and available priors, but all share minimal or zero reliance on explicit HR training data from outside the instance.

5. Empirical Performance and Benchmarking

Zero-shot super-resolution methods demonstrate distinctive strengths and trade-offs compared to supervised pipelines:

On real-world images with unknown or complex degradations, adaptive zero-shot approaches (ZSSR (Shocher et al., 2017), MZSR (Soh et al., 2020), RZSR (Yoo et al., 2022)) frequently outperform CNNs trained on synthetic data, owing to their flexibility.
For biomedical and scientific images—where paired data is rare—zero-shot frameworks (SADIR-Net (Zhang et al., 2020), CuNeRF (Chen et al., 2023), Gaussian splatting with 2D priors (Noh et al., 21 Aug 2025)) provide competitive and sometimes superior results (e.g., in PSNR/SSIM, structural feature recovery, and modulation transfer function).
In large-scale generative tasks (zero-shot text-driven panorama (Chen et al., 2022), DFU (Havrilla et al., 2023)), joint training on multiple resolutions, operator-based architectures, or dual codebook modeling allow generalization to higher resolutions unseen during training, achieving strong FID and IS scores.
In turbulence and weather downscaling, operator-based and GAN-based zero-shot methods (Wu et al., 22 Jul 2024, Sinha et al., 21 Sep 2024) recover small-scale physical features and spectra, though transformer-based models often outperform purely resolution-invariant operators in extreme zero-shot scenarios.

Empirical benchmarking indicates that zero-shot methods are highly competitive and occasionally state-of-the-art when confronted with mismatches between training and inference degradations or domains. Computational cost remains a consideration, as inference-time optimization or sampling (in the case of diffusion/consistency models) can be intensive, although recent work achieves sharp reductions in function evaluations (e.g., 4 NFEs with CMs (Garber et al., 29 Dec 2024)).

6. Application Domains and Impact

Zero-shot super-resolution has seen adoption and extension in a wide array of scientific, engineering, and creative domains:

Medical imaging: CT, MRI, and electron microscopy (e.g., SADIR-Net (Zhang et al., 2020), CuNeRF (Chen et al., 2023), Noise2SR (Tian et al., 20 Jun 2024), 3DGS-NAB (Noh et al., 21 Aug 2025)).
Scientific simulation: Turbulence reconstruction (Wu et al., 22 Jul 2024), weather downscaling (Sinha et al., 21 Sep 2024), urban micrometeorology (Yasuda et al., 30 Apr 2025).
Graphics and rendering: HDR panorama and 360° scene synthesis (Chen et al., 2022), mesh PBR texture upscaling (Chen et al., 3 Jun 2025), omnidirectional images (Li et al., 16 Apr 2024).
3D point cloud upsampling: Learning from single-instance point clouds (Zhou et al., 2021).
Text-guided super-resolution: Exploration of the set of valid HR solutions under textual constraints (Gandikota et al., 2 Mar 2024).

The main impact is the ability to enable SR in domains lacking HR training sets or with domain shift, or when physical constraints or instance-specific details are essential.

7. Future Perspectives and Current Limitations

There are several active research directions and limitations:

Efficiency of Inference: Many schemes require non-trivial optimization or multiple function evaluations per instance; progress (e.g., few-step consistency models (Garber et al., 29 Dec 2024), meta-initializations (Soh et al., 2020)) is closing this gap.
Hybridization with External Priors: Emerging work tightly integrates instance-specific adaptation with strong external priors (diffusion models, semantic models, or operator learning), yielding plug-and-play flexibility (e.g., in medical imaging (Noh et al., 21 Aug 2025), relighting/texturing (Chen et al., 3 Jun 2025)).
Extension to Structured and Unstructured Grids: Neural operator and transformer-based approaches generalize zero-shot SR to data on both regular and unstructured grids (Yasuda et al., 30 Apr 2025).
Ill-posedness and Diversity: Methods such as text-guided SR (Gandikota et al., 2 Mar 2024) highlight the fundamentally underdetermined nature of super-resolution at high upscaling ratios, reframing SR as a solution exploration problem and leveraging generative models for controlled diversity.

In summary, zero-shot super-resolution offers a versatile and robust set of methodologies that depart from the rigid, dataset-dependent structure of classical super-resolution. By leveraging internal recurrence, instance adaptation, external priors, and operator learning, these techniques demonstrate strong empirical performance across ill-posed problems, variable domains, and limited data settings, while continuing to expand the practical and theoretical frontiers of SR research.