InfoMax-GAN: Improved Adversarial Image Generation via Information Maximization and Contrastive Learning (2007.04589v6)

Published 9 Jul 2020 in cs.LG, cs.CV, and stat.ML

Abstract: While Generative Adversarial Networks (GANs) are fundamental to many generative modelling applications, they suffer from numerous issues. In this work, we propose a principled framework to simultaneously mitigate two fundamental issues in GANs: catastrophic forgetting of the discriminator and mode collapse of the generator. We achieve this by employing for GANs a contrastive learning and mutual information maximization approach, and perform extensive analyses to understand sources of improvements. Our approach significantly stabilizes GAN training and improves GAN performance for image synthesis across five datasets under the same training and evaluation conditions against state-of-the-art works. In particular, compared to the state-of-the-art SSGAN, our approach does not suffer from poorer performance on image domains such as faces, and instead improves performance significantly. Our approach is simple to implement and practical: it involves only one auxiliary objective, has a low computational cost, and performs robustly across a wide range of training settings and datasets without any hyperparameter tuning. For reproducibility, our code is available in Mimicry: https://github.com/kwotsin/mimicry.

Authors (3)

Kwot Sin Lee (6 papers)
Ngoc-Trung Tran (12 papers)
Ngai-Man Cheung (80 papers)

Citations (56)

View on Semantic Scholar

Summary

Insightful Overview of "InfoMax-GAN: Improved Adversarial Image Generation via Information Maximization and Contrastive Learning"

The paper "InfoMax-GAN: Improved Adversarial Image Generation via Information Maximization and Contrastive Learning" addresses two critical issues prevalent in Generative Adversarial Networks (GANs): catastrophic forgetting in discriminators and mode collapse in generators. The authors propose a novel framework that integrates contrastive learning and mutual information maximization to enhance the performance of GANs on image synthesis tasks across multiple datasets.

Technical Contributions

The paper introduces a GAN framework that simultaneously mitigates two fundamental GAN issues using a unified approach:

Catastrophic Forgetting Mitigation: The paper employs mutual information maximization on the discriminator's side. This encourages long-term representation learning by preserving essential features, which reduces the tendency of the discriminator to forget previously learned distributions as the training progresses.
Mode Collapse Mitigation: The authors apply contrastive learning to the generator, which incentivizes the production of diverse images. By aligning with the InfoNCE lower bound, this approach encourages the generator to minimize the overlap of generated samples, effectively decreasing mode collapse.

Quantitative Results

The paper reports significant improvements over state-of-the-art models, such as SSGAN, on multiple datasets (ImageNet, CelebA, CIFAR-10/100, STL-10). Notable performance enhancements include:

A reduction in FID scores by approximately 6.8 points on ImageNet compared to SNGAN.
Improved Inception Scores across various resolutions, demonstrating both superior diversity and image quality.

The implementation is noted to be lightweight and computationally efficient, requiring minimal additional overhead, which enhances its practicality for widespread use.

Implications and Future Directions

The framework proposed by the authors has several implications:

Robustness Across Datasets: The method's adaptability across different datasets and resolutions, without the need for excessive hyperparameter tuning, suggests strong generalization capabilities.
Theoretical Advancements: By incorporating the InfoMax principle and demonstrating its effectiveness in a GAN setting, this work could encourage further exploration into information theory's application in GANs.
Practical Applications: The integration of contrastive learning and mutual information maximization into GANs paves the way for developing more stable and efficient models, which could significantly benefit industries relying on high-quality image synthesis, such as entertainment and healthcare.

The findings also open up pathways for further research into applying these principles to other domains beyond image synthesis, such as 3D view synthesis or video frame generation, where coherent representation learning is crucial. Overall, this work offers a balanced framework that reduces GAN-related issues and enhances both training stability and image quality, setting a solid foundation for future developments in adversarial image generation.

PDF Markdown

Related Papers

GitHub

GitHub - kwotsin/mimicry: [CVPR 2020 Workshop] A PyTorch GAN library that reproduces research results for popular GANs. (603 stars)

Tweets

https://twitter.com/Ahkailash1/status/1489306653220827136