Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy (1906.01529v6)

Published 4 Jun 2019 in cs.LG and cs.CV

Abstract: Generative adversarial networks (GANs) have been extensively studied in the past few years. Arguably their most significant impact has been in the area of computer vision where great advances have been made in challenges such as plausible image generation, image-to-image translation, facial attribute manipulation and similar domains. Despite the significant successes achieved to date, applying GANs to real-world problems still poses significant challenges, three of which we focus on here. These are: (1) the generation of high quality images, (2) diversity of image generation, and (3) stable training. Focusing on the degree to which popular GAN technologies have made progress against these challenges, we provide a detailed review of the state of the art in GAN-related research in the published scientific literature. We further structure this review through a convenient taxonomy we have adopted based on variations in GAN architectures and loss functions. While several reviews for GANs have been presented to date, none have considered the status of this field based on their progress towards addressing practical challenges relevant to computer vision. Accordingly, we review and critically discuss the most popular architecture-variant, and loss-variant GANs, for tackling these challenges. Our objective is to provide an overview as well as a critical analysis of the status of GAN research in terms of relevant progress towards important computer vision application requirements. As we do this we also discuss the most compelling applications in computer vision in which GANs have demonstrated considerable success along with some suggestions for future research directions. Code related to GAN-variants studied in this work is summarized on https://github.com/sheqi/GAN_Review.

Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy

This comprehensive survey systematically examines the role of Generative Adversarial Networks (GANs) in advancing computer vision applications, particularly focusing on the challenges and solutions in generating high-quality and diverse images while ensuring stable training. The authors, Wang, She, and Ward, present a detailed taxonomy of GANs, categorizing them based on architectural and loss-function variants, and offer insights into their practical applications.

Key Challenges

The paper identifies three core challenges in deploying GANs for real-world applications:

  1. High Quality Image Generation: Ensuring that generated images are indistinguishable from real images.
  2. Image Diversity: Avoiding mode collapse and ensuring a broad range of images.
  3. Stable Training: Maintaining convergence and addressing issues like vanishing gradients.

Architectural Variants

The survey classifies architectural advances into several types, beginning with the original GAN, which primarily used fully connected layers, through to sophisticated models utilizing convolutional layers and other mechanisms:

  • Fully-connected GAN (FCGAN): Early architectures that often struggled with scalability and image quality.
  • Deep Convolutional GAN (DCGAN): Introduced convolutional and deconvolutional layers, significantly impacting image resolution.
  • Self-attention GAN (SAGAN) and BigGAN: Enhanced both image diversity and quality using self-attention mechanisms and large-scale architectures.
  • Progressive GAN (PROGAN): Employed a progressively growing architecture which contributed to stable training and high-resolution image generation.

Loss Function Variants

Addressing the fundamental instability in GAN training due to the original loss function, several variants have emerged to optimize the learning process:

  • Wasserstein GAN (WGAN) and WGAN-GP: Utilized Wasserstein distance for a smoother convergence, mitigating issues like mode collapse.
  • Least Square GAN (LSGAN): Proposed a least squares loss to steer generated samples towards real data distribution.
  • Spectral Normalization GAN (SN-GAN): Improved training stability by modulating the Lipschitz constants of the discriminator.

Implications and Future Directions

The extensive review of GAN architectures and loss functions delineates the significant progress made in addressing the core challenges of GANs. The stated aim is to provide insights that aid researchers in selecting appropriate GAN configurations for their specific computer vision applications. Furthermore, the paper explores emerging opportunities, particularly in extending gan capabilities to areas such as video generation and time-series synthesis.

Given the presented taxonomy, this work positions itself as a critical resource for understanding the landscape of GAN technology in computer vision. Future endeavors could build on these foundations, exploring further innovations in stable training dynamics and the development of generative models for less-explored domains like natural language processing.

In summary, this survey encapsulates the evolution and potential trajectories of GANs in computer vision, advocating for continued exploration of architectural innovations and optimization strategies to overcome persisting practical challenges.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Zhengwei Wang (15 papers)
  2. Qi She (37 papers)
  3. Tomas E. Ward (15 papers)
Citations (87)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com