Papers
Topics
Authors
Recent
Search
2000 character limit reached

Laplacian Pyramid Translation Network (LPTN)

Updated 3 February 2026
  • LPTN is an image-to-image translation framework that leverages Laplacian pyramid decomposition to separate global attribute edits from fine detail refinement.
  • It uses a lightweight low-frequency translator and progressive masking for high-frequency refinement, achieving state-of-the-art photorealism and computational efficiency.
  • Evaluations on benchmarks like MIT-Adobe FiveK show real-time 4K translation with improved PSNR and SSIM, demonstrating its practical impact in high-res image processing.

The Laplacian Pyramid Translation Network (LPTN) is an image-to-image translation framework designed for high-resolution, photorealistic transformation with real-time inference, particularly targeting computational efficiency at ultra-high resolutions. LPTN achieves this by decomposing the input image into low-frequency and high-frequency components using a closed-form Laplacian pyramid. The method applies a lightweight translation network to the low-frequency component for global attribute changes (e.g., illumination, color), and employs a progressive masking strategy to selectively refine high-frequency components, thus preserving detailed content. This design avoids the expensive computation associated with directly processing high-resolution feature maps and enables 4K image translation in real-time on a single commodity GPU (Liang et al., 2021).

1. Laplacian Pyramid Decomposition and Reconstruction

LPTN builds a depth-NN Laplacian pyramid to hierarchically decompose the input x≡L0∈RH×W×Cx \equiv L^0 \in \mathbb{R}^{H \times W \times C} as follows:

  • Low-pass downsampling: For level ll,

Ll+1=Down(Ll)L^{l+1} = \mathrm{Down}(L^l)

where Down(⋅)\mathrm{Down}(\cdot) is typically Gaussian filtering followed by 2× subsampling.

  • Residual (high-frequency) extraction:

Hl=Ll−Up(Ll+1)H^l = L^l - \mathrm{Up}(L^{l+1})

with Up(â‹…)\mathrm{Up}(\cdot) bilinear or transposed upsampling.

This yields LNL^N (the coarsest low-frequency image) and NN high-frequency bands H0,...,HN−1H^0, ..., H^{N-1} of decreasing resolution. Since the pyramid representation is lossless, reconstruction is given by:

x=up1(up2(…upN(LN)+HN−1… )+H1)+H0x = \mathrm{up}_1\bigl(\mathrm{up}_2(\dots \mathrm{up}_N(L^N) + H^{N-1} \dots ) + H^1 \bigr) + H^0

where upl(â‹…)\mathrm{up}_l(\cdot) denotes upsampling (and optional convolution) at level ll.

2. Network Architecture

The LPTN model explicitly decouples transformation tasks between scales:

(a) Low-Frequency Translator

  • Input: LNL^N of shape (H/2N)×(W/2N)×C(H/2^N) \times (W/2^N) \times C.
  • Module:

    • Expand: 1×11\times1 convolution, C→FC \rightarrow F (with F=32F=32 in practice).
    • Five Residual Blocks: Each with
    • 3×33\times3 conv, F→FF \rightarrow F, LeakyReLU
    • 3×33\times3 conv, F→FF \rightarrow F, LeakyReLU
    • Instance-norm precedes each block.
    • Project: 1×11\times1 conv, F→CF \rightarrow C; Tanh activation.
    • Output: A delta is predicted;

    L^N=LN+glow(LN)\hat{L}^N = L^N + g_{\mathrm{low}}(L^N)

    where glow(â‹…)g_{\mathrm{low}}(\cdot) is the stack described above.

Parameter count for this module is O(105)O(10^5).

(b) High-Frequency Refinement via Progressive Masking

Refinement is performed for each Laplacian high-frequency band using learned per-pixel masks:

  • Stage 1 (band l=N−1l=N-1):

    • Upsample LNL^N and L^N\hat{L}^N to match HN−1H^{N-1}'s size.
    • Concatenate [Up(LN),Up(L^N),HN−1][\mathrm{Up}(L^N), \mathrm{Up}(\hat{L}^N), H^{N-1}].
    • Pass through a small CNN to produce a per-pixel mask MN−1∈[0,1]h/2N−1×w/2N−1×1M^{N-1} \in [0, 1]^{h/2^{N-1} \times w/2^{N-1} \times 1}.
    • Refine:

    H^N−1=MN−1⊙HN−1\hat{H}^{N-1} = M^{N-1} \odot H^{N-1}

  • Subsequent stages (bands l=N−2l = N-2 to $0$):

    • Bilinearly upsample the previous mask Ml+1M^{l+1} by a factor of 2 to initialize MlM^l.
    • Refine MlM^l with a two-layer 3×33\times3 conv block (LeakyReLU).
    • Compute

    H^l=Ml⊙Hl\hat{H}^l = M^l \odot H^l

After all bands are refined, reconstruct x^\hat{x} using L^N\hat{L}^N and {H^l}\{\hat{H}^l\} analogously to the original Laplacian reconstruction.

3. Training Objectives and Optimization

LPTN utilizes both pixel-level and adversarial objectives:

  • Reconstruction loss:

Lrec=∥x−x^∥22L_{\mathrm{rec}} = \|x - \hat{x}\|_2^2

  • Adversarial loss: Least-Squares GAN (LSGAN)-style with three-scale Patch discriminator DD.

    • Generator objective:

    LadvG=Ex∼pdata[(D(G(x))−1)2]L_{\mathrm{adv}}^{G} = \mathbb{E}_{x \sim p_{\mathrm{data}}} \big[(D(G(x)) - 1)^2\big] - Discriminator objective:

    LadvD=Ex~∼pdata[(D(x~)−1)2]+Ex∼pdata[D(G(x))2]L_{\mathrm{adv}}^{D} = \mathbb{E}_{\tilde{x} \sim p_{\mathrm{data}}} \big[(D(\tilde{x}) - 1)^2\big] + \mathbb{E}_{x \sim p_{\mathrm{data}}} [D(G(x))^2]

  • Joint objective:

L=Lrec+λLadvL = L_{\mathrm{rec}} + \lambda L_{\mathrm{adv}}

with λ=0.1\lambda = 0.1.

No perceptual loss was used in the published results.

4. Computational Efficiency and Inference Performance

LPTN's architecture strategically exploits the sparsity of the Laplacian representation for computational savings:

  • Low-frequency translator operates on LNL^N with cost reduced by a factor 22N2^{2N} compared to full-resolution.
  • High-frequency refinement only involves small-resolution maps, with further reduction per band.
  • Runtime benchmarks on an 11 GB GPU (RTX 2080Ti):
Resolution N Time (ms) FPS
480p 3 3 333
1080p 4 7 143
2K 4 15 67
4K 5 16 62

Decomposition and reconstruction together require less than 2 ms per 4K image. LPTN thus sustains real-time performance even at 4K.

5. Quantitative and Qualitative Evaluation

LPTN was evaluated on image translation and photo-retouching benchmarks, notably the MIT-Adobe FiveK task. Table results for 1080p resolution:

Method PSNR SSIM
CycleGAN 20.86 0.846
UNIT 19.32 0.802
MUNIT 20.28 0.815
White-Box 21.26 0.872
DPE 21.94 0.885
LPTN (N=3) 22.09 0.883

LPTN matches or outperforms prior image-to-image translation (I2I) baselines and is comparable to state-of-the-art specifically designed photo retouching methods. Qualitatively, LPTN achieves faithful global edits (illumination, color) while preserving edge-level detail. In user studies (20 participants, 20 images), LPTN achieved highest photorealism over 75% of the time and style faithfulness approximately 50% of the time for day-to-night translation.

6. Implementation and Reproducibility

LPTN is available as open-source software (PyTorch 1.x, Python 3.7) with pre-trained models and scripts for replication and benchmarking. The key hyperparameters are Adam (β1=0.5,β2=0.999\beta_1=0.5, \beta_2=0.999), learning rate 10−410^{-4}, batch size 1, and adversarial weight λadv=0.1\lambda_{\mathrm{adv}}=0.1. All results and performance metrics were obtained on a single 11 GB GPU. Source and documentation are provided at https://github.com/csjliang/LPTN.


LPTN leverages closed-form Laplacian pyramid decomposition to isolate global attribute transformations in the low-frequency domain and detail preservation in the high-frequency domain, applying specialized sub-networks for each. This dual-path strategy enables photorealistic translation of ultra-high-resolution images with a modest computational footprint, providing real-time performance and state-of-the-art translation fidelity on demanding tasks (Liang et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Laplacian Pyramid Translation Network (LPTN).