Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis (2101.04775v1)

Published 12 Jan 2021 in cs.CV and cs.AI

Abstract: Training Generative Adversarial Networks (GAN) on high-fidelity images usually requires large-scale GPU-clusters and a vast number of training images. In this paper, we study the few-shot image synthesis task for GAN with minimum computing cost. We propose a light-weight GAN structure that gains superior quality on 1024*1024 resolution. Notably, the model converges from scratch with just a few hours of training on a single RTX-2080 GPU, and has a consistent performance, even with less than 100 training samples. Two technique designs constitute our work, a skip-layer channel-wise excitation module and a self-supervised discriminator trained as a feature-encoder. With thirteen datasets covering a wide variety of image domains (The datasets and code are available at: https://github.com/odegeasslbc/FastGAN-pytorch), we show our model's superior performance compared to the state-of-the-art StyleGAN2, when data and computing budget are limited.

Authors (4)

Bingchen Liu (22 papers)
Yizhe Zhu (51 papers)
Kunpeng Song (9 papers)
Ahmed Elgammal (55 papers)

Citations (211)

View on Semantic Scholar

Summary

The paper introduces a lightweight GAN with an SLE module and self-supervision that improves training speed and stability for high-resolution few-shot image synthesis.
It demonstrates competitive FID scores and robust performance across 13 diverse datasets using limited training samples and reduced computational cost.
The model’s innovations enable efficient style mixing and content disentanglement, offering significant implications for practical high-fidelity image synthesis.

Toward Efficient GAN Training for High-Fidelity Few-Shot Image Synthesis

The paper "Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis" addresses a significant challenge in the domain of Generative Adversarial Networks (GANs): synthesizing high-fidelity images with limited computational resources and few training samples. The authors introduce a novel lightweight GAN model that exhibits superior training efficiency on images of $1024 \times 1024$ resolution, requiring only a few hours on a single RTX-2080 GPU and performing reliably with less than 100 samples per class.

Methodological Innovations

The core contributions of the paper lie in two innovative design elements that enhance the GAN training process under stringent constraints:

Skip-Layer Channel-Wise Excitation (SLE) Module: This novel architectural feature leverages lower-scale activations to modify channel responses in higher-resolution feature maps, thus facilitating a more efficient gradient flow throughout the model. Unlike traditional skip-connections or residual blocks, SLE employs channel-wise multiplication, which is computationally inexpensive and supports style/content disentanglement, a functionality akin to that of StyleGAN.
Self-Supervised Discriminator: The paper proposes a self-supervision approach to regularize the discriminator by training it as a feature-encoder with an auxiliary decoder. This method encourages the discriminator to extract more comprehensive feature-maps, which in turn provide more informative training signals to the generator. Among various self-supervision strategies considered, auto-encoding was identified as particularly effective.

Experimental Insights

The paper demonstrates the model's proficiency across thirteen datasets of varying complexity and style, outperforming the state-of-the-art StyleGAN2 under limited data and computational scenarios. Key highlights from the experiments include:

Efficiency Gains: The proposed GAN can achieve competitive FID scores comparably or better than StyleGAN2 with significantly reduced computational costs. For instance, it converges quickly enough to facilitate style mixing and content-style disentanglement.
Robustness Across Domains: The model is validated extensively on datasets ranging from artistic paintings to real-world photos, illustrating its robustness across diverse image domains.
Self-Supervision Utility: The introduction of self-supervised learning into GANs helps stabilize training and prevent issues such as mode collapse, even with tiny datasets.

Potential Implications

The methodological advancements presented in this paper hold considerable promise for numerous theoretical and practical applications. From a theoretical perspective, the introduction of self-supervised techniques into GANs opens new avenues for exploring model robustness and generalization, particularly in data-scarce scenarios. Practically, this research facilitates the synthesis of high-fidelity images in contexts where data is inherently limited, such as rare medical conditions or bespoke artistic styles, providing a significant step forward in the democratization of high-quality image synthesis technology.

Future Directions

Looking forward, further exploration of self-supervision in discriminators could shed light on improved stability and performance of GANs in unconventional training scenarios. Additionally, application of these methods on larger datasets with more varied classes could more comprehensively validate the model's scalability and adaptability. Investigating the interplay between the SLE module and other architectural advances in GANs could also yield insights into optimizing trade-offs between computational efficiency and synthesis quality.

In sum, the authors present a compelling approach to addressing the challenges of high-fidelity image synthesis with GANs under constrained computational and data conditions. Their contributions could significantly impact future research directions and practical applications in AI-driven image synthesis.

PDF Markdown