High-resolution Deep Convolutional Generative Adversarial Networks (1711.06491v18)

Published 17 Nov 2017 in cs.CV

Abstract: Generative Adversarial Networks (GANs) [Goodfellow et al. 2014] convergence in a high-resolution setting with a computational constrain of GPU memory capacity has been beset with difficulty due to the known lack of convergence rate stability. In order to boost network convergence of DCGAN (Deep Convolutional Generative Adversarial Networks) [Radford et al. 2016] and achieve good-looking high-resolution results we propose a new layered network, HDCGAN, that incorporates current state-of-the-art techniques for this effect. Glasses, a mechanism to arbitrarily improve the final GAN generated results by enlarging the input size by a telescope {\zeta} is also presented. A novel bias-free dataset, Curt\'o & Zarza, containing human faces from different ethnical groups in a wide variety of illumination conditions and image resolutions is introduced. Curt\'o is enhanced with HDCGAN synthetic images, thus being the first GAN augmented dataset of faces. We conduct extensive experiments on CelebA [Liu et al. 2015], CelebA-hq [Karras et al. 2018] and Curt\'o. HDCGAN is the current state-of-the-art in synthetic image generation on CelebA achieving a MS-SSIM of 0.1978 and a FR\'ECHET Inception Distance of 8.44.

PDF Abstract

High-resolution Deep Convolutional Generative Adversarial Networks: A Summary

The paper "High-resolution Deep Convolutional Generative Adversarial Networks" introduces HDCGAN, an advancement in the field of Generative Adversarial Networks (GANs) specifically tailored for generating high-resolution images. The work addresses a critical challenge within GAN research: effectively generating high-quality images while maintaining convergence stability, particularly in the high-dimensional pixel space associated with high-resolution datasets.

Overview of Contributions

HDCGAN Architecture: The authors propose HDCGAN, which extends the Deep Convolutional Generative Adversarial Network (DCGAN) framework. The architecture incorporates techniques such as Scaled Exponential Linear Units (SELU) and BatchNorm layers, collectively referred to as BS layers, to enhance convergence stability and image quality. This framework allows for successful training on large images, up to the size of 512×512 pixels, a task that many traditional GAN approaches struggle with due to mode collapse and unstable training.
Magnifying Glass Approach: The paper introduces the concept of "Glasses," which involves enlarging the input dimensions of the neural network by a telescope factor, denoted as ζ. This allows for finer details to be captured and represented in the generated images, improving the final output without modifying the convolutional filter sizes.
Curt Dataset: The dataset introduced, named Curt, offers a diverse collection of human faces from various ethnic backgrounds and illumination conditions. By presenting a complex yet balanced dataset, Curt serves as a suitable benchmark for assessing GAN performance in generating images with diverse inherent attributes. The dataset comes with extensive labels, facilitating various classification tasks.

Numerical Results

In empirical evaluations, HDCGAN achieves superior results, particularly in metrics such as Multi-scale Structural Similarity (MS-SSIM) and Fréchet Inception Distance (FID). Specifically, on the CelebA dataset, HDCGAN attains an MS-SSIM of 0.1978 and a FID of 8.44, improving upon existing methodologies. The incorporation of BS layers contributes to the stabilization of the adversarial training process, yielding a consistent reduction in errors for both the generator and discriminator.

Practical and Theoretical Implications

The practical implications of this research are significant for various computer vision tasks requiring high-quality image synthesis, including image inpainting, 3D model generation, and domain translation. HDCGAN's ability to produce high-resolution outputs while maintaining training stability can be leveraged in fields such as digital content creation and augmented reality, where synthetic image generation is increasingly prevalent.

Theoretically, the paper contributes to the understanding of GAN architectures by exploring the use of self-normalizing networks and the impact of neural network input scaling on generative performance. The introduction of the "Glasses" technique invites further exploration into architectural adaptations that allow neural networks to handle high-resolution data more effectively.

Speculation on Future Developments

Future developments in this domain may include refining the robustness of GANs with even fewer examples and exploring alternative activation functions that could further stabilize high-resolution image generation. Additionally, integrating HDCGAN with other cutting-edge techniques may lead to even better results in terms of image quality and diversity. Research could also investigate the theoretical underpinnings related to MVU estimators within large-scale GAN architectures, illuminating new pathways for efficient and resilient training methodologies in generative models.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

J. D. Curtó (4 papers)
I. C. Zarza (4 papers)
Fernando de la Torre (49 papers)
Irwin King (170 papers)
Michael R. Lyu (176 papers)

Citations (32)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos