Laplacian Pyramid Translation Network (LPTN)

Updated 3 February 2026

LPTN is an image-to-image translation framework that leverages Laplacian pyramid decomposition to separate global attribute edits from fine detail refinement.
It uses a lightweight low-frequency translator and progressive masking for high-frequency refinement, achieving state-of-the-art photorealism and computational efficiency.
Evaluations on benchmarks like MIT-Adobe FiveK show real-time 4K translation with improved PSNR and SSIM, demonstrating its practical impact in high-res image processing.

The Laplacian Pyramid Translation Network (LPTN) is an image-to-image translation framework designed for high-resolution, photorealistic transformation with real-time inference, particularly targeting computational efficiency at ultra-high resolutions. LPTN achieves this by decomposing the input image into low-frequency and high-frequency components using a closed-form Laplacian pyramid. The method applies a lightweight translation network to the low-frequency component for global attribute changes (e.g., illumination, color), and employs a progressive masking strategy to selectively refine high-frequency components, thus preserving detailed content. This design avoids the expensive computation associated with directly processing high-resolution feature maps and enables 4K image translation in real-time on a single commodity GPU (Liang et al., 2021).

1. Laplacian Pyramid Decomposition and Reconstruction

LPTN builds a depth- $N$ Laplacian pyramid to hierarchically decompose the input $x \equiv L^0 \in \mathbb{R}^{H \times W \times C}$ as follows:

Low-pass downsampling: For level $l$ ,

$L^{l+1} = \mathrm{Down}(L^l)$

where $\mathrm{Down}(\cdot)$ is typically Gaussian filtering followed by 2× subsampling.

Residual (high-frequency) extraction:

$H^l = L^l - \mathrm{Up}(L^{l+1})$

with $\mathrm{Up}(\cdot)$ bilinear or transposed upsampling.

This yields $L^N$ (the coarsest low-frequency image) and $N$ high-frequency bands $H^0, ..., H^{N-1}$ of decreasing resolution. Since the pyramid representation is lossless, reconstruction is given by:

$x = \mathrm{up}_1\bigl(\mathrm{up}_2(\dots \mathrm{up}_N(L^N) + H^{N-1} \dots ) + H^1 \bigr) + H^0$

where $\mathrm{up}_l(\cdot)$ denotes upsampling (and optional convolution) at level $l$ .

2. Network Architecture

The LPTN model explicitly decouples transformation tasks between scales:

(a) Low-Frequency Translator

Input: $L^N$ of shape $(H/2^N) \times (W/2^N) \times C$ .
Module:
- Expand: $1\times1$ convolution, $C \rightarrow F$ (with $F=32$ in practice).
- Five Residual Blocks: Each with
- $3\times3$ conv, $F \rightarrow F$ , LeakyReLU
- $3\times3$ conv, $F \rightarrow F$ , LeakyReLU
- Instance-norm precedes each block.
- Project: $1\times1$ conv, $F \rightarrow C$ ; Tanh activation.
- Output: A delta is predicted;
$\hat{L}^N = L^N + g_{\mathrm{low}}(L^N)$

where $g_{\mathrm{low}}(\cdot)$ is the stack described above.

Parameter count for this module is $O(10^5)$ .

Refinement is performed for each Laplacian high-frequency band using learned per-pixel masks:

Stage 1 (band $l=N-1$ ):
- Upsample $L^N$ and $\hat{L}^N$ to match $H^{N-1}$ 's size.
- Concatenate $[\mathrm{Up}(L^N), \mathrm{Up}(\hat{L}^N), H^{N-1}]$ .
- Pass through a small CNN to produce a per-pixel mask $M^{N-1} \in [0, 1]^{h/2^{N-1} \times w/2^{N-1} \times 1}$ .
- Refine:
$\hat{H}^{N-1} = M^{N-1} \odot H^{N-1}$
Subsequent stages (bands $l = N-2$ to $0$):
- Bilinearly upsample the previous mask $M^{l+1}$ by a factor of 2 to initialize $M^l$ .
- Refine $M^l$ with a two-layer $3\times3$ conv block (LeakyReLU).
- Compute
$\hat{H}^l = M^l \odot H^l$

After all bands are refined, reconstruct $\hat{x}$ using $\hat{L}^N$ and $\{\hat{H}^l\}$ analogously to the original Laplacian reconstruction.

3. Training Objectives and Optimization

LPTN utilizes both pixel-level and adversarial objectives:

Reconstruction loss:

$L_{\mathrm{rec}} = \|x - \hat{x}\|_2^2$

Adversarial loss: Least-Squares GAN (LSGAN)-style with three-scale Patch discriminator $D$ $D$ .
- Generator objective:
$L_{\mathrm{adv}}^{G} = \mathbb{E}_{x \sim p_{\mathrm{data}}} \big[(D(G(x)) - 1)^2\big]$ - Discriminator objective:

$L_{\mathrm{adv}}^{D} = \mathbb{E}_{\tilde{x} \sim p_{\mathrm{data}}} \big[(D(\tilde{x}) - 1)^2\big] + \mathbb{E}_{x \sim p_{\mathrm{data}}} [D(G(x))^2]$
Joint objective:

$L = L_{\mathrm{rec}} + \lambda L_{\mathrm{adv}}$

with $\lambda = 0.1$ .

No perceptual loss was used in the published results.

4. Computational Efficiency and Inference Performance

LPTN's architecture strategically exploits the sparsity of the Laplacian representation for computational savings:

Low-frequency translator operates on $L^N$ with cost reduced by a factor $2^{2N}$ compared to full-resolution.
High-frequency refinement only involves small-resolution maps, with further reduction per band.
Runtime benchmarks on an 11 GB GPU (RTX 2080Ti):

Resolution	N	Time (ms)	FPS
480p	3	3	333
1080p	4	7	143
2K	4	15	67
4K	5	16	62

Decomposition and reconstruction together require less than 2 ms per 4K image. LPTN thus sustains real-time performance even at 4K.

5. Quantitative and Qualitative Evaluation

LPTN was evaluated on image translation and photo-retouching benchmarks, notably the MIT-Adobe FiveK task. Table results for 1080p resolution:

Method	PSNR	SSIM
CycleGAN	20.86	0.846
UNIT	19.32	0.802
MUNIT	20.28	0.815
White-Box	21.26	0.872
DPE	21.94	0.885
LPTN (N=3)	22.09	0.883

LPTN matches or outperforms prior image-to-image translation (I2I) baselines and is comparable to state-of-the-art specifically designed photo retouching methods. Qualitatively, LPTN achieves faithful global edits (illumination, color) while preserving edge-level detail. In user studies (20 participants, 20 images), LPTN achieved highest photorealism over 75% of the time and style faithfulness approximately 50% of the time for day-to-night translation.

6. Implementation and Reproducibility

LPTN is available as open-source software (PyTorch 1.x, Python 3.7) with pre-trained models and scripts for replication and benchmarking. The key hyperparameters are Adam ( $\beta_1=0.5, \beta_2=0.999$ ), learning rate $10^{-4}$ , batch size 1, and adversarial weight $\lambda_{\mathrm{adv}}=0.1$ . All results and performance metrics were obtained on a single 11 GB GPU. Source and documentation are provided at https://github.com/csjliang/LPTN.

LPTN leverages closed-form Laplacian pyramid decomposition to isolate global attribute transformations in the low-frequency domain and detail preservation in the high-frequency domain, applying specialized sub-networks for each. This dual-path strategy enables photorealistic translation of ultra-high-resolution images with a modest computational footprint, providing real-time performance and state-of-the-art translation fidelity on demanding tasks (Liang et al., 2021).

Markdown Report Issue Upgrade to Chat

References (1)

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Laplacian Pyramid Translation Network (LPTN).

Laplacian Pyramid Translation Network (LPTN)

1. Laplacian Pyramid Decomposition and Reconstruction

2. Network Architecture

(a) Low-Frequency Translator

(b) High-Frequency Refinement via Progressive Masking

3. Training Objectives and Optimization

4. Computational Efficiency and Inference Performance

5. Quantitative and Qualitative Evaluation

6. Implementation and Reproducibility

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Laplacian Pyramid Translation Network (LPTN)

1. Laplacian Pyramid Decomposition and Reconstruction

2. Network Architecture

(a) Low-Frequency Translator

(b) High-Frequency Refinement via Progressive Masking

3. Training Objectives and Optimization

4. Computational Efficiency and Inference Performance

5. Quantitative and Qualitative Evaluation

6. Implementation and Reproducibility

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics