Repeated Upscaling-Downscaling Process

Updated 4 February 2026

RUDP is a technique that iteratively enhances image quality by alternately applying upsampling and downsampling to reconstruct high-frequency details.
It integrates deep supervision, detail loss, and invertible mappings within modern super-resolution architectures to preserve image fidelity.
Empirical studies show that RUDP improves PSNR and stabilizes output quality over multiple cycles, unifying classical methods with advanced deep learning.

The Repeated Upscaling–Downscaling Process (RUDP) is a methodology for enhancing the quality of high-dimensional signals—primarily images—via iterative cycles of upsampling and downsampling. Its formal structure and application have been rigorously developed for deep learning–based super-resolution, particularly in contexts where high-frequency detail reconstruction and bidirectional mappings between resolutions are paramount (Han et al., 14 Jan 2026, Pan et al., 2022, Michelini et al., 2018, Sun et al., 2023, Park et al., 2019). The process has emerged as a unifying abstraction connecting classic iterative back-projection, multigrid methods, modern attention-based super-resolution, invertible flows, and cycle-consistency constraints for arbitrary-scale rescaling.

1. Mathematical Formalism and Algorithmic Variants

At its core, RUDP is defined by the repeated application of upsampling ( $U$ ) and downsampling ( $D$ ) operators on a low-resolution (LR) feature or image. For a given input $F_{\rm LR}\in\mathbb{R}^{h\times w\times C}$ , the canonical $K$ -step RUDP cycle is:

$F^{(0)} = F_{\rm LR}$

For $k=1,2,...,K$ : $\begin{aligned} F^U_k &= U(F^{(k-1)}) \ H^{(k)}_{\rm D} &= \text{DetailBlock}(F^U_k) \ H^{(k)}_{\rm SR} &= F^U_k + H^{(k)}_{\rm D} \ F_{\rm D}^{(k)} &= D(H^{(k)}_{\rm SR}) \ F^{(k)} &= \mathrm{Concat}(F_{\rm LR}, F_{\rm D}^{(1)},...,F_{\rm D}^{(k)}) \end{aligned}$ The output $H^{(K)}_{\rm SR}$ is mapped to the final RGB super-resolved image, and optionally the detail images $\hat{D}^{(k)}$ and intermediate SR outputs $\hat{I}_{\rm SR}^{(k)}$ are also made available (Han et al., 14 Jan 2026).

In bidirectional or arbitrary-scale variants, such as in BAIRNet (Pan et al., 2022), the process involves learned invertible mappings $f_{\rm up}$ and $f_{\rm down}$ parameterized for arbitrary scale factors and robust to repeated cycles, with explicit cycle-idempotence objectives.

The process can be instantiated recursively (multi-level, as in multigrid methods (Michelini et al., 2018)) or in stage-wise, recurrent training loops (Park et al., 2019).

2. Integration into Deep Super-Resolution Architectures

RUDP acts as a meta-structuring device for deep super-resolution networks. A typical architecture employing RUDP includes:

Feature-Extraction Block (FE): Processes the LR image to produce a feature tensor, typically via convolution and residual blocks.
Upscale Block (UB): Transposed convolution and additional layers upscale the feature map; a detail block isolates high-frequency structure.
Downscale Block (DB): Strided convolutions (or equivalent operators) reduce the resolution of the upscaled output, generating new inputs for subsequent RUDP iterations.
Iterative Pipeline: The pipeline cycles through $K$ repetitions of UB $\to$ DB, each supervised independently (deep supervision) via reconstruction and detail losses (Han et al., 14 Jan 2026, Michelini et al., 2018).

Specialized RUDP architectures, e.g., BAIRNet (Pan et al., 2022), employ shared encoders and subpixel MLPs to enable scale-continuity and preserve detail through repeated rescaling cycles. In SDFlow (Sun et al., 2023), a single invertible normalizing-flow network simultaneously models both LR $\to$ HR (super-resolution) and HR $\to$ LR (downscaling) by decoupling shared content and domain-specific latent variables.

3. Training Objectives, Losses, and Deep Supervision

RUDP-enhanced models are supervised not solely by standard image reconstruction losses but also by high-frequency–focused objectives. A prototypical design is:

Reconstruction Loss: $L_{SR}^{(k)} = \lVert I_{\rm HR} - \hat{I}_{\rm SR}^{(k)} \rVert_1$ .
Detail Loss: Using Laplacian-pyramid decomposition, $L_{\rm detail}^{(k)} = \lVert D_{\rm GT} - \hat{D}^{(k)} \rVert_1$ .
Total Loss: Weighted sum across RUDP stages,

$L_{\rm total} = \sum_{k=1}^K W_k [ L_{SR}^{(k)} + \lambda L_{\rm detail}^{(k)} ]$

where $\lambda$ controls the detail-vs-reconstruction tradeoff and $W_k$ sets the relative importance of each RUDP stage (Han et al., 14 Jan 2026).

For arbitrary-scale and cycle-idempotent objectives (Pan et al., 2022), loss terms include cycle-consistency, weak LR reference, and robustness to repeated up/down cycles: $\mathcal{L}_{\rm total} = \lambda_1\,\mathcal{L}_{\rm cycle}^{(n)} + \lambda_2\,\mathcal{L}_{\rm ref}$ Invertible models (SDFlow (Sun et al., 2023)) minimize exact maximum likelihood, latent-space alignment losses, and adversarial terms on the content/structure codes, enabling unpaired, bidirectional learning.

4. Theoretical Rationale and Empirical Effects

The motivation for RUDP is both empirical and theoretical:

High-Frequency Amplification: Repeated rescaling, especially when coupled with explicit detail supervision, forces the network to progressively correct high-frequency and error residuals, mitigating over-smoothing typical of pixel-based $L_1$ or MSE-only approaches (Han et al., 14 Jan 2026).
Feature Diversity: Cycling outputs through the network after downscaling exposes subsequent RUDP iterations to previously reconstructed as well as new detail, increasing diversity and promoting robustness against artifacts (Han et al., 14 Jan 2026).
Progressive Refinement: Deep supervision on intermediates stabilizes training, enables incremental error correction, and encourages convergence to sharper outputs (Michelini et al., 2018).
Cycle-Idempotency: In arbitrary-scale or invertible architectures, explicit multi-cycle losses ensure the RUDP system avoids error accumulation or drift even under repeated application, preserving both numerical fidelity and perceptual realism (Pan et al., 2022, Sun et al., 2023).

5. Empirical Results and Ablation Studies

Controlled ablations confirm the effectiveness and complementarity of RUDP and detail supervision. For the LaUD network (Han et al., 14 Jan 2026), comparative results on Set5/Set14/BSD100 show:

Model	RUDP	DetailLoss	Set5 PSNR	Set14 PSNR	BSD100 PSNR
M1	no	no	38.09	33.97	32.29
M2	no	yes	38.28	34.28	32.40
M3	yes	no	38.32	34.61	32.49
M4	yes	yes	38.42	34.77	32.55

RUDP alone improves PSNR (e.g., +0.23 dB on Set5), with further gains when combined with detail loss (+0.34 dB) (Han et al., 14 Jan 2026).

Robustness to repeated cycles is quantitatively demonstrated in BAIRNet (Pan et al., 2022), where the model sustains less than 0.5 dB degradation over five RUDP cycles, contrasting with 2–3 dB drops in earlier invertible or unidirectional models, and no visible ghosting or color warps after multiple passes.

Normalizing-flow models (SDFlow (Sun et al., 2023)) enable stochastic, many-to-many RUDP cycles, producing diverse yet plausible HR–LR pairs over successive cycles, with stable perceptual metrics and reference-free fidelity scores.

6. Relationships to Classical Methods and Unifying Abstractions

RUDP generalizes classic iterative back-projection (IBP) (Michelini et al., 2018), where an upscaler is corrected by feeding back errors via a downscaling operator, as well as multigrid solvers that recursively transfer corrections across scales. Modern RUDP implementations embed these principles in deep architectures, replacing fixed filters with trainable convolutional or flow-based modules.

In multigrid back-projection super-resolution (Michelini et al., 2018), RUDP is instantiated as a recursive, learned V- or W-cycle, synthesizing outputs at multiple upscaling factors with minimal parameter counts and per-cycle residual corrections.

The stage-wise RUDP in recurrent SR network training (Park et al., 2019) extends the concept to self-distillation: each SR network is trained against increasingly enhanced HR targets generated via its own previous stages, leading to sharper, more perceptually aligned outcomes.

7. Design Considerations, Limitations, and Practical Guidelines

Empirical findings support optimal RUDP repeat counts ( $K$ ) of 3–4, where both PSNR and perceptual metrics (e.g., VIQET MOS (Park et al., 2019)) typically saturate; further cycles yield negligible improvement or slight over-enhancement. Key parameters and recommendations include:

Upsampler: Transposed convolution + cascaded convs with LeakyReLU or tailored MLP subpixel modules (Han et al., 14 Jan 2026, Pan et al., 2022).
Downsampler: Strided convolution with or without learned subpixel weights (Han et al., 14 Jan 2026, Pan et al., 2022).
Detail-loss norm: $L_1$ preferred over $L_2$ for edge/texture fidelity (Han et al., 14 Jan 2026).
Cycle weightings: Later cycles may be given higher loss weights to encourage incremental refinement (Han et al., 14 Jan 2026).

RUDP increases training cost linearly with the number of cycles but imposes negligible additional inference cost if only the final output is required. In bidirectional, invertible, or stochastic settings, RUDP supports arbitrary or indefinite cycling (with stochastic sampling in the latent space yielding diverse outputs (Sun et al., 2023)).

Convergence is practical but not theoretically guaranteed for arbitrary nonlinear architectures; empirical saturation is typically reached by $K=3$ –4 (Park et al., 2019).

References:

"Detail Loss in Super-Resolution Models Based on the Laplacian Pyramid and Repeated Upscaling and Downscaling Process" (Han et al., 14 Jan 2026)
"Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence" (Pan et al., 2022)
"Multigrid Backprojection Super-Resolution and Deep Filter Visualization" (Michelini et al., 2018)
"Learning Many-to-Many Mapping for Unpaired Real-World Image Super-resolution and Downscaling" (Sun et al., 2023)
"Image Enhancement by Recurrently-trained Super-resolution Network" (Park et al., 2019)