Demystifying MMD GANs (1801.01401v5)

Published 4 Jan 2018 in stat.ML and cs.LG

Abstract: We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs. As our main theoretical contribution, we clarify the situation with bias in GAN loss functions raised by recent work: we show that gradient estimators used in the optimization process for both MMD GANs and Wasserstein GANs are unbiased, but learning a discriminator based on samples leads to biased gradients for the generator parameters. We also discuss the issue of kernel choice for the MMD critic, and characterize the kernel corresponding to the energy distance used for the Cramer GAN critic. Being an integral probability metric, the MMD benefits from training strategies recently developed for Wasserstein GANs. In experiments, the MMD GAN is able to employ a smaller critic network than the Wasserstein GAN, resulting in a simpler and faster-training algorithm with matching performance. We also propose an improved measure of GAN convergence, the Kernel Inception Distance, and show how to use it to dynamically adapt learning rates during GAN training.

Citations (1,273)

View on Semantic Scholar

Summary

The paper clarifies the unbiased nature of gradient estimators in MMD GANs while highlighting bias from finite sample-based discriminator training.
It demonstrates that the energy distance kernel is a specific instance of MMD, leveraging IPM representation and gradient penalties for improved training.
The paper introduces the Kernel Inception Distance (KID) as an unbiased metric, enabling faster, more efficient GAN training with smaller critic networks.

Overview of "Demystifying MMD GANs"

Introduction

The paper "Demystifying MMD GANs" presents an insightful exploration into the training and evaluation of Generative Adversarial Networks (GANs), utilizing the Maximum Mean Discrepancy (MMD) as the critic function, thereby referred to as MMD GANs. This paper provides critical theoretical clarifications on bias in GAN loss functions and discusses kernel choices for MMD. Additionally, it introduces a new measure of GAN convergence called the Kernel Inception Distance (KID).

Theoretical Contributions

One key theoretical contribution of the paper is its clarification on the unbiased nature of gradient estimators used during the optimization in both MMD GANs and Wasserstein GANs (WGANs). It demonstrates that while gradient estimators for both methods are unbiased, the process of learning a discriminator based on finite sample sets introduces bias in the gradients of the generator’s parameters. This is a significant insight, as understanding the bias introduced during training can lead to the development of better and more stable GANs.

Kernel Choices and Training

The paper addresses the critical issue of kernel selection for the MMD critic, establishing that the kernel associated with the energy distance used in Cramér GANs is a specific instance of the MMD. It also explores the benefits of using the integral probability metric (IPM) representation of the MMD, which allows the employment of recent training strategies developed for Wasserstein GANs. This includes using gradient penalties to regularize the MMD witness function, analogous to the constraints used in WGAN-GP.

Practical Advantages

Empirical results highlight the practical advantages of MMD GANs over WGANs. Specifically, MMD GANs can utilize smaller critic networks, thereby simplifying and accelerating the training process while maintaining performance parity with WGAN-GPs. This results in more computationally efficient models without compromising the quality of generated samples.

Kernel Inception Distance (KID)

A novel contribution of the paper is the introduction of the Kernel Inception Distance (KID), which serves as an improved measure of GAN convergence. KID is derived from the MMD and is shown to provide dynamic adaptability of learning rates during GAN training. Unlike the Fréchet Inception Distance (FID), KID has an unbiased estimator, making it a more reliable metric for evaluating GAN performance.

Experimental Results

Experiments conducted on standard benchmark datasets such as MNIST, CIFAR-10, LSUN, and CelebA demonstrate the efficacy of MMD GANs. Particularly, MMD GANs exhibit superior performance on complex datasets when using the rational quadratic kernel augmented with linear features. The experimental results underscore the capability of MMD GANs to generate high-quality samples while benefiting from reduced critic complexity.

Implications and Future Directions

The findings of this paper have both practical and theoretical implications. From a practical perspective, the ability to use smaller critic networks and achieve faster training times can democratize the use of GANs in resource-constrained environments. Theoretically, the insights into gradient bias and unbiased evaluation metrics pave the way for more robust and stable GAN architectures.

Future developments in AI may explore further refinements of integral probability metrics, potentially uncovering new kernels and training strategies that enhance GAN performance. Additionally, the use of KID could be expanded to other generative models, providing a unified and reliable metric for evaluating generative performance across diverse applications.

PDF Markdown