Analyzing and Improving the Image Quality of StyleGAN (1912.04958v2)

Published 3 Dec 2019 in cs.CV, cs.LG, cs.NE, eess.IV, and stat.ML

Abstract: The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably attribute a generated image to a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.

PDF Abstract

Analyzing and Improving the Image Quality of StyleGAN

The paper "Analyzing and Improving the Image Quality of StyleGAN" by Tero Karras et al. explores the weaknesses of the original StyleGAN architecture, identifies the root causes of its characteristic artifacts, and proposes a series of modifications to enhance image quality. This analysis and subsequent improvements pertain to both the generator's architecture and the associated training methods.

The style-based architecture of StyleGAN had set a standard in data-driven unconditional generative image modeling. However, it exhibited certain artifacts, which the authors categorize and systematically address. The key innovations discussed in the paper encompass revised normalization techniques, alternative network architectures, and innovative regularization strategies. Below is a structured analysis of the key contributions and findings.

Identification and Removal of Normalization Artifacts

The paper identifies that the original StyleGAN's usage of Adaptive Instance Normalization (AdaIN) led to persistent, blob-like artifacts in generated images. These artifacts originated from the generator's attempt to bypass AdaIN by embedding signal strength into feature maps. The authors propose replacing AdaIN with a novel "demodulation" technique that normalizes feature maps based on expected statistics without explicitly forcing normalization.

This change eradicates the artifacts by preventing the generator from disproportionately amplifying certain features. Empirical results demonstrate that the revised technique maintains the generator's controllability while improving image quality consistency. The Fréchet Inception Distance (FID) remains largely unchanged, but there is an observed tradeoff favoring recall over precision.

Path Length Regularization and Lazy Regularization

The authors introduce the notion of "path length regularization," aimed at ensuring smoothness in the latent space-to-image mapping. This regularizer addresses inconsistencies in latent space interpolation, which contribute to improved overall image quality. The path length regularizer optimizes the network to produce consistent outputs that align well, both geometrically and visually.

A practical optimization, termed "lazy regularization," is adopted to reduce computational expenses. This approach involves computing regularization terms less frequently than the main loss, without detriment to effectiveness. The combination of these strategies yields significant improvement in the perceptual path length metric (PPL) and overall image quality, as evidenced by lower PPL scores correlating with aesthetically superior images.

Alternative Architectures and Progressive Growing

The paper revisits the concept of progressive growing, which, while stabilizing high-resolution synthesis, introduces artifacts like location preference for details. The authors explore alternative network architectures, including skip connections and residual networks, to mitigate these issues.

An empirical evaluation reveals that a skip generator paired with a residual discriminator delivers optimal performance without the need for progressively growing the network. This configuration shows notable improvements in both FID and PPL, suggesting a resolution usage aligned with theoretical capacity, thus overcoming the capacity constraints seen with the original architecture.

Image Projection and Attribution

The paper also explores the projection of images into the latent space, which has applications in image manipulation and source attribution. A refined projection method demonstrates that the improved StyleGAN2 generator significantly enhances the accuracy and reliability of image projections back into the latent space. This improvement facilitates the attribution of generated images to their specific sources, providing a stronger framework for validating the authenticity of synthetic media.

Empirical Results and Dataset Evaluation

Extensive empirical evaluation across datasets such as FFHQ and LSUN categories underscore the improvements brought by StyleGAN2. The revised architecture and training techniques result in quantifiable, consistent enhancements in image quality metrics.

For instance, in the LSUN Car dataset, StyleGAN2 achieved an FID score of 2.32 compared to the original StyleGAN's 3.27. A similar trend is observed across other datasets like LSUN Church and LSUN Cat, indicating the robustness and generalizability of the proposed modifications.

Conclusion and Future Work

The paper presents a compendium of methodological advancements that collectively enhance the image quality of generative models. The transition from StyleGAN to StyleGAN2 underscores a methodical refinement process addressing both architectural design and training paradigms. These improvements set a new benchmark in generative image modeling.

Future research could delve into further refining path length regularization, potentially exploring feature-space metrics to replace pixel-space distances. Additionally, reducing training data requirements without compromising output quality remains a critical challenge, particularly for datasets with inherent variability or limited sample availability.

In summary, the contributions of Karras et al. present a significant advance in the domain of generative adversarial networks, offering valuable insights and practical improvements that are likely to inspire subsequent research and applications in AI image synthesis.