Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Photorealistic Style Transfer via Wavelet Transforms (1903.09760v2)

Published 23 Mar 2019 in cs.CV

Abstract: Recent style transfer models have provided promising artistic results. However, given a photograph as a reference style, existing methods are limited by spatial distortions or unrealistic artifacts, which should not happen in real photographs. We introduce a theoretically sound correction to the network architecture that remarkably enhances photorealism and faithfully transfers the style. The key ingredient of our method is wavelet transforms that naturally fits in deep networks. We propose a wavelet corrected transfer based on whitening and coloring transforms (WCT$2$) that allows features to preserve their structural information and statistical properties of VGG feature space during stylization. This is the first and the only end-to-end model that can stylize a $1024\times1024$ resolution image in 4.7 seconds, giving a pleasing and photorealistic quality without any post-processing. Last but not least, our model provides a stable video stylization without temporal constraints. Our code, generated images, and pre-trained models are all available at https://github.com/ClovaAI/WCT2.

Citations (335)

Summary

  • The paper introduces wavelet pooling and unpooling in deep networks to preserve structural details and enhance photorealism.
  • It achieves rapid processing of 1024×1024 images in 4.7 seconds with improved SSIM and Gram loss performance.
  • The method supports stable video stylization while minimizing artifacts, promoting practical applications in photorealistic AI.

Photorealistic Style Transfer via Wavelet Transforms

The paper, "Photorealistic Style Transfer via Wavelet Transforms," introduces a novel approach addressing the limitations of existing style transfer methods when applied to photographs. These methods often result in spatial distortions and artifacts that detract from photorealism. The proposed solution leverages wavelet transforms to enhance photorealistic quality by preserving fine details and preventing unwanted artifacts.

Methodology

The core innovation lies in the integration of wavelet transforms into deep neural networks, particularly the replacement of max-pooling with wavelet pooling and unpooling mechanisms. This substitution retains more structural information by exploiting the exact reconstruction properties of wavelets. The approach maintains the statistical properties in the VGG feature space using a method known as whitening and coloring transforms (WCT2^2).

The architecture employs a progressive stylization strategy within a single encoder-decoder pass, as opposed to the recursive multilevel strategy seen in existing models like WCT and PhotoWCT. This design decision yields two main advantages: a reduction in parameter inefficiencies by using a single decoder, and minimized artifact amplification, which enhances photorealism.

Results

Remarkably, the model processes high-resolution images (1024×1024) rapidly in just 4.7 seconds without the need for post-processing, a significant improvement over the state-of-the-art PhotoWCT and Deep Photo Style Transfer models. Experimental evaluations indicate superior performance in terms of SSIM and Gram loss, with the proposed model achieving better visual quality and faster performance.

Moreover, the paper highlights the model's capability for stable video stylization, eliminating the need for additional temporal constraints. User studies further demonstrate a preference for outputs generated by this model, underscoring its practical utility and visual appeal.

Implications

The introduction of wavelet-based operations within neural networks can potentially transform various applications beyond style transfer, including image reconstruction, compression, and super-resolution, due to their minimal information loss and signal retention properties. The potential future exploration could focus on eliminating dependencies on semantic label maps to further enhance the model's robustness and applicability.

In conclusion, the integration of wavelet transforms into style transfer models presents a significant step in maintaining photorealism while achieving style fidelity. The efficiency and practicality of the model make it a compelling choice for real-world applications, promising further advancements in both theoretical explorations and practical deployments in AI-driven photorealism.

Youtube Logo Streamline Icon: https://streamlinehq.com