StudioGAN: A Taxonomy and Benchmark of GANs for Image Synthesis (2206.09479v3)

Published 19 Jun 2022 in cs.CV, cs.LG, and eess.IV

Abstract: Generative Adversarial Network (GAN) is one of the state-of-the-art generative models for realistic image synthesis. While training and evaluating GAN becomes increasingly important, the current GAN research ecosystem does not provide reliable benchmarks for which the evaluation is conducted consistently and fairly. Furthermore, because there are few validated GAN implementations, researchers devote considerable time to reproducing baselines. We study the taxonomy of GAN approaches and present a new open-source library named StudioGAN. StudioGAN supports 7 GAN architectures, 9 conditioning methods, 4 adversarial losses, 12 regularization modules, 3 differentiable augmentations, 7 evaluation metrics, and 5 evaluation backbones. With our training and evaluation protocol, we present a large-scale benchmark using various datasets (CIFAR10, ImageNet, AFHQv2, FFHQ, and Baby/Papa/Granpa-ImageNet) and 3 different evaluation backbones (InceptionV3, SwAV, and Swin Transformer). Unlike other benchmarks used in the GAN community, we train representative GANs, including BigGAN and StyleGAN series in a unified training pipeline and quantify generation performance with 7 evaluation metrics. The benchmark evaluates other cutting-edge generative models (e.g., StyleGAN-XL, ADM, MaskGIT, and RQ-Transformer). StudioGAN provides GAN implementations, training, and evaluation scripts with the pre-trained weights. StudioGAN is available at https://github.com/POSTECH-CVLab/PyTorch-StudioGAN.

PDF Abstract

StudioGAN: A Taxonomy and Benchmark of GANs for Image Synthesis

This paper, "StudioGAN: A Taxonomy and Benchmark of GANs for Image Synthesis," addresses the rising need for a consistent and reproducible framework to evaluate Generative Adversarial Networks (GANs) in image synthesis. It introduces StudioGAN, an open-source library, facilitating a reliable benchmark for GANs by providing standardized implementations, training protocols, and evaluation metrics.

Taxonomy and Implementation

GANs are categorized along five primary dimensions: architecture, conditioning methods, adversarial losses, regularization, and data-efficient training. StudioGAN encompasses a comprehensive collection of modules supporting:

7 GAN architectures (from DCGAN to StyleGAN3)
9 conditioning methods
4 adversarial losses
12 regularization modules
3 differentiable augmentations

These components enable a systematic approach for researchers aiming to implement, compare, and enhance GANs within a unified environment.

Evaluation Protocols and Benchmarks

A critical contribution of this work is the establishment of evaluation protocols that mitigate inconsistencies arising from variations in training data preprocessing and evaluation backbones. An extensive benchmark is presented involving datasets like CIFAR10 and ImageNet, using various metrics such as Inception Score (IS), Fréchet Inception Distance (FID), Precision, and Recall to ensure a multidimensional analysis of model performance.

Key Findings

The evaluation reveals:

StyleGAN2 tends to offer higher recall and coverage values, suggesting enhanced diversity compared to BigGAN, despite occasionally lower quality in more complex distributions.
Evaluation Metrics: The paper underscores the influence of choice in evaluation backbones, with SwAV showing consistent results aligned with human perception, while InceptionV3 often favors certain models.

Potential Hazards and Considerations

The paper highlights potential biases introduced by particular evaluation backbones and emphasizes the need for a balanced approach that considers multiple metrics. It raises concerns regarding intra-class fidelity and the dependency of metrics on the evaluation setup, urging caution in drawing conclusions solely from FID values.

Practical and Theoretical Implications

The work points to significant implications for both the practical deployment of GANs in industrial settings and theoretical advancements in generative modeling. The detailed benchmark can guide the optimization of GAN training paradigms while advancing knowledge of model capabilities and limits.

Future Directions

The paper calls for refined evaluation methodologies incorporating human-in-the-loop evaluations and the exploration of GANs in broader, open-world image synthesis tasks. Despite the burgeoning interest in alternative models like diffusion or AR models, GANs demonstrate efficiency in parameter use and synthesis speed, positioning them as competitive alternatives in large-scale applications.

In conclusion, StudioGAN offers an invaluable resource for the generative modeling community, enhancing the reproducibility and fairness in evaluating GAN architectures, and setting a cornerstone for future advancement in generative image synthesis.