Laplacian Pyramid Translation Network (LPTN)
- LPTN is an image-to-image translation framework that leverages Laplacian pyramid decomposition to separate global attribute edits from fine detail refinement.
- It uses a lightweight low-frequency translator and progressive masking for high-frequency refinement, achieving state-of-the-art photorealism and computational efficiency.
- Evaluations on benchmarks like MIT-Adobe FiveK show real-time 4K translation with improved PSNR and SSIM, demonstrating its practical impact in high-res image processing.
The Laplacian Pyramid Translation Network (LPTN) is an image-to-image translation framework designed for high-resolution, photorealistic transformation with real-time inference, particularly targeting computational efficiency at ultra-high resolutions. LPTN achieves this by decomposing the input image into low-frequency and high-frequency components using a closed-form Laplacian pyramid. The method applies a lightweight translation network to the low-frequency component for global attribute changes (e.g., illumination, color), and employs a progressive masking strategy to selectively refine high-frequency components, thus preserving detailed content. This design avoids the expensive computation associated with directly processing high-resolution feature maps and enables 4K image translation in real-time on a single commodity GPU (Liang et al., 2021).
1. Laplacian Pyramid Decomposition and Reconstruction
LPTN builds a depth- Laplacian pyramid to hierarchically decompose the input as follows:
- Low-pass downsampling: For level ,
where is typically Gaussian filtering followed by 2× subsampling.
- Residual (high-frequency) extraction:
with bilinear or transposed upsampling.
This yields (the coarsest low-frequency image) and high-frequency bands of decreasing resolution. Since the pyramid representation is lossless, reconstruction is given by:
where denotes upsampling (and optional convolution) at level .
2. Network Architecture
The LPTN model explicitly decouples transformation tasks between scales:
(a) Low-Frequency Translator
- Input: of shape .
- Module:
- Expand: convolution, (with in practice).
- Five Residual Blocks: Each with
- conv, , LeakyReLU
- conv, , LeakyReLU
- Instance-norm precedes each block.
- Project: conv, ; Tanh activation.
- Output: A delta is predicted;
where is the stack described above.
Parameter count for this module is .
(b) High-Frequency Refinement via Progressive Masking
Refinement is performed for each Laplacian high-frequency band using learned per-pixel masks:
- Stage 1 (band ):
- Upsample and to match 's size.
- Concatenate .
- Pass through a small CNN to produce a per-pixel mask .
- Refine:
- Subsequent stages (bands to $0$):
- Bilinearly upsample the previous mask by a factor of 2 to initialize .
- Refine with a two-layer conv block (LeakyReLU).
- Compute
After all bands are refined, reconstruct using and analogously to the original Laplacian reconstruction.
3. Training Objectives and Optimization
LPTN utilizes both pixel-level and adversarial objectives:
- Reconstruction loss:
- Adversarial loss: Least-Squares GAN (LSGAN)-style with three-scale Patch discriminator .
- Generator objective:
- Discriminator objective:
- Joint objective:
with .
No perceptual loss was used in the published results.
4. Computational Efficiency and Inference Performance
LPTN's architecture strategically exploits the sparsity of the Laplacian representation for computational savings:
- Low-frequency translator operates on with cost reduced by a factor compared to full-resolution.
- High-frequency refinement only involves small-resolution maps, with further reduction per band.
- Runtime benchmarks on an 11 GB GPU (RTX 2080Ti):
| Resolution | N | Time (ms) | FPS |
|---|---|---|---|
| 480p | 3 | 3 | 333 |
| 1080p | 4 | 7 | 143 |
| 2K | 4 | 15 | 67 |
| 4K | 5 | 16 | 62 |
Decomposition and reconstruction together require less than 2 ms per 4K image. LPTN thus sustains real-time performance even at 4K.
5. Quantitative and Qualitative Evaluation
LPTN was evaluated on image translation and photo-retouching benchmarks, notably the MIT-Adobe FiveK task. Table results for 1080p resolution:
| Method | PSNR | SSIM |
|---|---|---|
| CycleGAN | 20.86 | 0.846 |
| UNIT | 19.32 | 0.802 |
| MUNIT | 20.28 | 0.815 |
| White-Box | 21.26 | 0.872 |
| DPE | 21.94 | 0.885 |
| LPTN (N=3) | 22.09 | 0.883 |
LPTN matches or outperforms prior image-to-image translation (I2I) baselines and is comparable to state-of-the-art specifically designed photo retouching methods. Qualitatively, LPTN achieves faithful global edits (illumination, color) while preserving edge-level detail. In user studies (20 participants, 20 images), LPTN achieved highest photorealism over 75% of the time and style faithfulness approximately 50% of the time for day-to-night translation.
6. Implementation and Reproducibility
LPTN is available as open-source software (PyTorch 1.x, Python 3.7) with pre-trained models and scripts for replication and benchmarking. The key hyperparameters are Adam (), learning rate , batch size 1, and adversarial weight . All results and performance metrics were obtained on a single 11 GB GPU. Source and documentation are provided at https://github.com/csjliang/LPTN.
LPTN leverages closed-form Laplacian pyramid decomposition to isolate global attribute transformations in the low-frequency domain and detail preservation in the high-frequency domain, applying specialized sub-networks for each. This dual-path strategy enables photorealistic translation of ultra-high-resolution images with a modest computational footprint, providing real-time performance and state-of-the-art translation fidelity on demanding tasks (Liang et al., 2021).