ContraGAN: Contrastive Learning for Conditional Image Generation (2006.12681v3)

Published 23 Jun 2020 in cs.CV and cs.LG

Abstract: Conditional image generation is the task of generating diverse images using class label information. Although many conditional Generative Adversarial Networks (GAN) have shown realistic results, such methods consider pairwise relations between the embedding of an image and the embedding of the corresponding label (data-to-class relations) as the conditioning losses. In this paper, we propose ContraGAN that considers relations between multiple image embeddings in the same batch (data-to-data relations) as well as the data-to-class relations by using a conditional contrastive loss. The discriminator of ContraGAN discriminates the authenticity of given samples and minimizes a contrastive objective to learn the relations between training images. Simultaneously, the generator tries to generate realistic images that deceive the authenticity and have a low contrastive loss. The experimental results show that ContraGAN outperforms state-of-the-art-models by 7.3% and 7.7% on Tiny ImageNet and ImageNet datasets, respectively. Besides, we experimentally demonstrate that contrastive learning helps to relieve the overfitting of the discriminator. For a fair comparison, we re-implement twelve state-of-the-art GANs using the PyTorch library. The software package is available at https://github.com/POSTECH-CVLab/PyTorch-StudioGAN.

PDF Abstract

ContraGAN: Conditional Image Generation Through Contrastive Learning

The paper presents a novel approach to conditional image generation, introducing the Contrastive Generative Adversarial Network (ContraGAN). The primary innovation lies in the incorporation of a conditional contrastive loss (2C loss) that simultaneously considers both data-to-class and data-to-data relations within the Generative Adversarial Networks (GAN) framework.

Conditional Image Generation and GANs

Conditional image generation involves creating diverse images conditional on class labels. Traditional conditional GANs focus on data-to-class relations, evaluating the proximity between the embeddings of an image and its corresponding label. Popular models like ACGAN and ProjGAN have leveraged this approach to achieve realistic image synthesis.

However, these models primarily utilize pair-based losses, neglecting the potential enrichments available by considering data-to-data relations within a batch. Overfitting issues in discriminators and training collapse are noted limitations in these traditional models.

Introducing ContraGAN

ContraGAN seeks to enhance GAN-based image generation by incorporating data-to-data relationships via a conditional contrastive loss. This is achieved by pulling closer the embeddings of images sharing the same label, while pushing apart the embeddings of images from different classes.

The 2C loss, a modification of the NT-Xent loss used in self-supervised learning, integrates both data-to-class and data-to-data relations without requiring extensive data augmentation.

Experimental Outcomes

ContraGAN outperforms existing models across multiple benchmarks:

Tiny ImageNet: Achieved a 7.3% improvement in FID over top models.
ImageNet: Made a substantial 7.7% reduction in FID.
CIFAR10: Delivered competitive results, coming within 1.3% of state-of-the-art performance.

These results underscore the benefits of harnessing both intra-class and inter-class data relationships, effectively addressing overfitting of the discriminator, and achieving superior training stability.

Methodological Innovations

The 2C loss stands out by avoiding the inefficiencies associated with traditional mining of negative and positive samples, a common practice in metric learning. Instead, through leveraging label information directly, it enables a more computation-efficient process.

The paper also incorporated comparisons with other metric learning losses like P-NCA and NT-Xent, confirming the advantages of the dual-relation approach of 2C loss.

Implications and Future Directions

The paper contributes significantly to the field of conditional image synthesis by proposing an effective method to stabilize adversarial training and improve image quality. This approach could inform future research focusing on:

Advanced Regularization: Investigating further regularization techniques to enhance GAN performance.
Training Efficiency: Streamlining computational processes for large-scale datasets.

ContraGAN's implementation is widely applicable, offering promising directions in image-to-image translation, new data generation, and other GAN-related tasks where understanding and generating complex visual data are crucial.

Overall, the ContraGAN framework represents a substantive step forward in the sophistication of conditional image generation via GANs, integrating insights from self-supervised learning to address existing limitations and pave the way for further advancements in AI-driven image synthesis.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Minguk Kang (9 papers)
Jaesik Park (62 papers)

Citations (2)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - POSTECH-CVLab/PyTorch-StudioGAN: StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation. (3,375 stars)