Repeated Upscaling and Downscaling Process

Updated 21 January 2026

Repeated upscaling and downscaling is a method that alternates upsampling and downsampling operations to iteratively enhance image resolution and robustness.
The approach integrates techniques like iterative back-projection, multi-grid recursion, and invertible network modules to efficiently extract and preserve high-frequency details.
Cycle consistency losses and Laplacian detail loss are incorporated to stabilize reconstructions and ensure minimal degradation over multiple up/down cycles.

A repeated upscaling and downscaling process is a fundamental operation and architectural motif in modern image super-resolution, enhancement, and rescaling networks. It involves the sequential application of upsampling (increasing spatial resolution) and downsampling (reducing spatial resolution) operators, either within the latent feature space of a neural network or as a component of multi-stage optimization strategies. By iteratively alternating these operations—sometimes in concert with explicit loss functions, cycle consistency penalties, or additional detail-extraction branches—models can improve convergence, stabilize cycle idempotence, and amplify high-frequency information critical for visually plausible reconstructions. This methodology underpins diverse frameworks ranging from Laplacian pyramid–guided enhancement and multi-grid iterative back-projection to fully invertible architectures engineered for bijective mapping between high- and low-resolution domains.

1. Mathematical Definitions and Iterative Frameworks

The process is mathematically structured as compositions of upscaling and downscaling operators acting on image or feature tensors. Upscaling (operator $U$ ) generally increases the spatial resolution via interpolation, learned transpose convolutions, or pixel-shuffle modules, while downscaling (operator $D$ ) reduces resolution, typically using strided convolution, pooling, or differentiable downsampling. Formally, for an image or feature map $y$ , the cycle is

$y^{(k+1)} = U(D(y^{(k)})).$

This core recursion serves as the foundation for various algorithmic instantiations:

Iterative Back-Projection (IBP): Classic super-resolution methods use repeated cycles to progressively refine reconstructions, enforcing that upscaled images, when downscaled, match original LR benchmarks. Let $x$ be the available LR image and $y^{(0)}$ the initial HR estimate; IBP performs

$y^{(k+1)} = y^{(k)} + U\bigl(x - D\,y^{(k)}\bigr),$

converging exponentially toward consistency $D\,y = x$ under suitable operator norms (Michelini et al., 2018).

Multigrid Recursion: For progressive upscaling, as in Multi-Grid Back-Projection (MGBP), up/down cycles are performed recursively across nested scales, gradually refining feature representations while propagating error corrections (Michelini et al., 2018).

2. Neural Network Embodiments of Repeated Upscaling and Downscaling

Contemporary architectures embed the repeated up/down process either as explicit recursion in the pixel or feature domain, or via learned residual correction at each stage:

CNN-based Loops (RUDP): The repeated upscaling and downscaling process (RUDP) iteratively upsamples and then downsamples intermediate feature maps, concatenating the resulting tensors with the original LR input, and feeding them through additional feature extraction blocks. Denoting LR input $I_{\text{LR}}$ , the $k$ -th iteration computes: $H_{\text{U}}^{(k)} = f_{\uparrow}(f(H^{(k-1)})),\quad H_{\text{D}}^{(k)} = f(f_{\uparrow}(f(H^{(k-1)}))),$

$H_{\text{SR}}^{(k)} = H_{\text{U}}^{(k)} + H_{\text{D}}^{(k)},$

$H^{(k)} = f_{\downarrow}(f(H_{\text{SR}}^{(k)})),\quad (k < K),$

with $f_{\uparrow}$ a transposed convolution (upsampling) and $f_{\downarrow}$ a stride- $s$ convolution (downsampling) (Han et al., 14 Jan 2026). This process is repeated $K$ times, typically $K=3$ yielding a favorable trade-off between performance and computational complexity.

Attention or Invertible Backbones: In hybrid systems, RUDP modules are inserted between self-attention or dense blocks (e.g., in DRLN or ABPN), or adapted to the up/down projectors of back-projection networks, enhancing high-frequency detail extraction (Han et al., 14 Jan 2026).

3. Cycle Consistency and Idempotence

Repeated up/down cycles naturally raise questions of idempotence: does the data degrade, drift, or converge under multiple iterations? Cycle consistency losses are imposed to mitigate information loss and drive the system toward a stationary manifold. In the bidirectional arbitrary image rescaling setting (Pan et al., 2022), the $n$ -cycle is defined as

$\hat x^{(n)} = (U \circ D)^n(x),$

and the cycle loss as

$L_{\text{cycle}} = \mathbb{E}_{x,s} \bigl[\, \|x - (U \circ D)(x) \|_1 \bigr].$

Empirical evaluation (e.g., over 5 up/down cycles) demonstrates that explicitly trained multi-cycle systems (e.g., BAIRNet†) manifest minimal PSNR degradation (within 1–2 dB over 5 cycles), while invertible architectures (e.g., IRN) maintain stationarity within 0.02 dB, reflecting negligible drift (Pan et al., 2022, Xiao et al., 2020).

4. Integration with High-Frequency Detail Loss

The combination of RUDP and Laplacian pyramid–based detail loss accentuates reconstruction of high-frequency textures. At each iteration $k$ , both the SR image loss and detail loss are imposed: $\mathcal{L}_{s}^{(k)} = \lVert I_{\text{HR}} - \hat I_{\text{SR}}^{(k)} \rVert_1,\quad \mathcal{L}_{d}^{(k)} = \lVert D_{\text{GT}} - \hat D_{\text{SR}}^{(k)} \rVert_1,$

$\mathcal{L}^{(k)} = w_k\,\mathcal{L}_{s}^{(k)} + w_k\,\alpha\,\mathcal{L}_{d}^{(k)},$

with increasing $w_k$ to progressively emphasize the detail loss at deeper iterations. Empirical results confirm that integrating both mechanisms yields synergistic gains in PSNR, SSIM, and perceptual metrics, outperforming both vanilla CNN baselines and certain attention-based networks (Han et al., 14 Jan 2026).

5. Invertibility and Bijective Rescaling Networks

Invertible architectures (e.g., IRN, IARN) are designed to make repeated up/down cycles perfectly information-preserving or nearly so. These leverage affine coupling–layer stacks and bijective transforms that guarantee

$f_{\theta}^{-1}(f_{\theta}(x)) = x, \quad f_{\theta}(f_{\theta}^{-1}(y,z)) = (y,z),$

for HR image $x$ , LR image $y$ , and auxiliary latent $z$ . Preemptive channel splitting, position-aware scale encoding, and explicit invertibility constraints are used to restrict the effect of non-invertible interpolation to a latent "safe" subspace, thus minimizing cumulative error over arbitrarily many cycles. Empirically, repeated rescaling cycles in these systems sustain virtually unchanged PSNR, confirming quasi-bijectivity (Xiao et al., 2020, Pan et al., 2022).

6. Application: Model Efficiency and Hierarchical Detail Extraction

Recurrent application of up- and downscaling in the training loop can lead to more efficient models. By generating successively sharper targets via alternating upscaling and downscaling of the original images, even compact networks can learn to reconstruct images that match or exceed the perceptual quality (measured by VIQET MOS) of much larger baselines. The improvement saturates after 3–4 recurrences, with later cycles contributing negligibly to perceptual metrics and difference ratios (Park et al., 2019). This strategy allows practitioners to trade training repetitions for inference-time model capacity.

Reference	Architecture/Process	Cycle Robustness (PSNR drift)
(Han et al., 14 Jan 2026)	RUDP + Laplacian detail loss	<0.05 dB/cycle (for K≤3)
(Pan et al., 2022)	BAIRNet, multi-cycle finetuning	1–2 dB over 5 cycles
(Xiao et al., 2020)	IRN, invertible transformation	0.02 dB over 5 cycles
(Pan et al., 2022)	IARN, preemptive splitting	~0.01 dB/cycle (empirical)
(Park et al., 2019)	Recurrent training (RTS)	Converges after 3–4 iterations

7. Limitations and Practical Considerations

While repeated up/down cycles facilitate detail enhancement and robustness, they introduce additional computation and parameter overhead, especially at higher repetition counts ( $K \geq 4$ ), where the marginal benefit becomes minimal. Moreover, consensus exists that for extremely large upscaling factors (e.g., $8\times$ ), the inherent information deficit in the LR domain may saturate the effectiveness of further recursion, regardless of architectural sophistication (Han et al., 14 Jan 2026). Thus, selecting the optimal number of cycles and harmonizing them with detail-amplifying losses is crucial for resource-efficient and effective deployment.

The repeated upscaling and downscaling process constitutes an integral mechanism across diverse image super-resolution, rescaling, and enhancement frameworks. Its theoretical underpinnings, empirical validation, and architectural versatility continue to drive state-of-the-art results, with innovations in cycle-consistent and invertible mappings further extending its utility in robust, bidirectional, and arbitrary-scale vision applications (Han et al., 14 Jan 2026, Pan et al., 2022, Xiao et al., 2020, Park et al., 2019, Pan et al., 2022, Michelini et al., 2018).