Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Image Processing Using Multi-Code GAN Prior (1912.07116v2)

Published 15 Dec 2019 in cs.CV

Abstract: Despite the success of Generative Adversarial Networks (GANs) in image synthesis, applying trained GAN models to real image processing remains challenging. Previous methods typically invert a target image back to the latent space either by back-propagation or by learning an additional encoder. However, the reconstructions from both of the methods are far from ideal. In this work, we propose a novel approach, called mGANprior, to incorporate the well-trained GANs as effective prior to a variety of image processing tasks. In particular, we employ multiple latent codes to generate multiple feature maps at some intermediate layer of the generator, then compose them with adaptive channel importance to recover the input image. Such an over-parameterization of the latent space significantly improves the image reconstruction quality, outperforming existing competitors. The resulting high-fidelity image reconstruction enables the trained GAN models as prior to many real-world applications, such as image colorization, super-resolution, image inpainting, and semantic manipulation. We further analyze the properties of the layer-wise representation learned by GAN models and shed light on what knowledge each layer is capable of representing.

Multi-Code GAN Prior for Image Processing Applications

This paper introduces a novel method termed mGANprior, which leverages multiple latent codes to utilize pre-trained GAN models as effective priors for various image processing tasks. The approach addresses persistent challenges in applying GAN-generated images to real-world image processing, particularly the difficulty in achieving accurate image reconstruction through existing inversion methods, which typically rely on a single latent code.

The primary innovation of the mGANprior approach is its use of multiple latent codes, each generating feature maps at an intermediate layer of a GAN's generator. These maps are composed using adaptive channel importance weights to enhance reconstruction quality. This enhances the expressive power of GAN latent spaces by allowing them to capture more complex image structures and details, facilitating high-quality reconstructions of real images. The paper postulates that this over-parameterization of the latent space significantly outperforms existing methods in terms of reconstruction fidelity.

Extensive experiments demonstrate that mGANprior not only exceeds the capabilities of previous inversion methods but also makes GAN models applicable as priors for a broader range of tasks, including image colorization, super-resolution, image inpainting, and semantic manipulation. This is achieved without needing to retrain the models, highlighting the flexibility and efficiency of mGANprior in reusing the knowledge embedded within GANs.

Key quantitative results are reported, illustrating that mGANprior significantly outperforms conventional GAN inversion techniques. For instance:

  • mGANprior achieved the highest Peak Signal-to-Noise Ratio (PSNR) and the lowest Learned Perceptual Image Patch Similarity (LPIPS) across multiple datasets including LSUN bedroom and CelebA-HQ.
  • In tasks such as image colorization and super-resolution, mGANprior consistently produced results that are comparable or superior to specialized models like Zhang et al.'s method and ESRGAN, respectively.

The implications of this work are multifaceted. Practically, mGANprior democratizes access to GAN’s high-quality generative capabilities for real image processing tasks without additional training, thereby broadening the applications of generative models in image editing and restoration domains. Theoretically, the results suggest that mGANprior sheds light on the layer-wise representation capabilities innate in GAN models. The findings contribute to a deeper understanding of how abstract and high-level semantic structures are encoded and can be harnessed for image reconstruction.

Future research directions include exploring the scalability of mGANprior to more diverse image categories and larger datasets. Additionally, further investigations could focus on optimizing channel importance weights more effectively and automating the selection of intermediate layers for optimal feature composition.

In summary, this paper provides a comprehensive exploration of enhancing GAN inversion through multiple latent codes, presenting a robust framework that combines the strengths of well-trained GAN models with practical image processing applications, while simultaneously advancing the theoretical understanding of GAN layer-wise knowledge representation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jinjin Gu (56 papers)
  2. Yujun Shen (111 papers)
  3. Bolei Zhou (134 papers)
Citations (301)