CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training (1703.10155v2)

Published 29 Mar 2017 in cs.CV

Abstract: We present variational generative adversarial networks, a general learning framework that combines a variational auto-encoder with a generative adversarial network, for synthesizing images in fine-grained categories, such as faces of a specific person or objects in a category. Our approach models an image as a composition of label and latent attributes in a probabilistic model. By varying the fine-grained category label fed into the resulting generative model, we can generate images in a specific category with randomly drawn values on a latent attribute vector. Our approach has two novel aspects. First, we adopt a cross entropy loss for the discriminative and classifier network, but a mean discrepancy objective for the generative network. This kind of asymmetric loss function makes the GAN training more stable. Second, we adopt an encoder network to learn the relationship between the latent space and the real image space, and use pairwise feature matching to keep the structure of generated images. We experiment with natural images of faces, flowers, and birds, and demonstrate that the proposed models are capable of generating realistic and diverse samples with fine-grained category labels. We further show that our models can be applied to other tasks, such as image inpainting, super-resolution, and data augmentation for training better face recognition models.

Authors (5)

Jianmin Bao (65 papers)
Dong Chen (219 papers)
Fang Wen (42 papers)
Houqiang Li (236 papers)
Gang Hua (101 papers)

Citations (191)

View on Semantic Scholar

Summary

An Examination of CVAE-GAN: Asymmetric Training for Fine-Grained Image Generation

The paper presents an innovative approach to generating finely detailed images in specific categories—such as facial images of a particular person or objects within a defined category—by integrating Conditional Variational Auto-Encoder (CVAE) and Generative Adversarial Network (GAN) elements into a unified framework termed CVAE-GAN. This research aims to address several limitations observed in traditional GAN and VAE models, including image blurriness and instability during training, by proposing an asymmetric training methodology.

Core Contributions

Asymmetric Loss Function: The paper introduces an asymmetric loss approach wherein the generative network uses a mean discrepancy objective as opposed to the cross-entropy loss employed by the discriminative and classifier networks. This design choice crucially enhances the stability of GAN training, mitigating prevalent issues like gradient vanishing and mode collapse. The generator focuses on minimizing the $\ell_2$ distance in the mean feature space against real data, thereby improving image quality and training stability.
Encoder Network Utilization: By employing an encoder network, the approach establishes a robust mapping between latent space and real image space. This facilitates reconstructing images based on latent vectors and enforces diversity—combating the generator's tendency to produce uniform outputs. Furthermore, pixel-level reconstruction losses ensure that structural elements in generated images, such as facial features, remain coherent, thereby preserving image realism.

Methodology and Results

The CVAE-GAN framework comprises four interconnected modules: the encoder network ( $E$ ), the generative network ( $G$ ), the discriminative network ( $D$ ), and the classification network ( $C$ ). Each network component plays a distinct role, contributing to the enhanced synthesis capability of the model:

Encoder Network: Maps real images into the latent space, producing a latent vector that serves as input for the generative network.
Generative Network: Generates images based on latent vectors, aiming to match real data distribution through statistical and pixel-level feature alignment.
Discriminative Network: Distinguishes between real and synthesized images, providing feedback for generator optimization.
Classification Network: Assesses class probabilities, supporting category-conditioned generation.

Quantitative evaluations underscore CVAE-GAN's superiority in generating realistic, structurally intact, and diverse samples across multiple categories, including facial imagery and floral and avian species. Through comparative metrics like image discriminability and inception scores, CVAE-GAN demonstrates marked improvements over traditional models such as CVAE and CGAN.

Practical Implications and Future Work

The research posits several practical applications for CVAE-GAN-generated images, including:

Image Inpainting: Restoring missing portions of images using learned latent representations.
Attribute Morphing: Gradually altering image attributes, a technique beneficial for understanding transition dynamics in image features.
Data Augmentation: By synthesizing additional training data, CVAE-GAN can significantly enhance model robustness and accuracy in tasks like face recognition.

Speculating on future developments, the integration of asymmetric training could be further refined to handle unknown categories, expanding CVAE-GAN's application to broader domains within AI-driven image synthesis. Continued exploration of conditional and unconditional generative models may yield substantial advancements in AI's capability to reproduce complex visual patterns and distributions accurately.

CVAE-GAN's nuanced combination of variational auto-encoding and adversarial networks marks a notable stride in computer vision, enabling controlled generation of high-quality, diverse, and contextually relevant images across a spectrum of categories.

PDF Markdown

Related Papers

Find Related Papers