Training generative neural networks via Maximum Mean Discrepancy optimization (1505.03906v1)

Published 14 May 2015 in stat.ML and cs.LG

Abstract: We consider training a deep neural network to generate samples from an unknown distribution given i.i.d. data. We frame learning as an optimization minimizing a two-sample test statistic---informally speaking, a good generator network produces samples that cause a two-sample test to fail to reject the null hypothesis. As our two-sample test statistic, we use an unbiased estimate of the maximum mean discrepancy, which is the centerpiece of the nonparametric kernel two-sample test proposed by Gretton et al. (2012). We compare to the adversarial nets framework introduced by Goodfellow et al. (2014), in which learning is a two-player game between a generator network and an adversarial discriminator network, both trained to outwit the other. From this perspective, the MMD statistic plays the role of the discriminator. In addition to empirical comparisons, we prove bounds on the generalization error incurred by optimizing the empirical MMD.

Citations (502)

View on Semantic Scholar

Summary

The paper introduces MMD as an alternative to adversarial discriminators, framing generative training as minimizing a two-sample test statistic.
It demonstrates that MMD nets achieve superior mean log density scores (e.g., 315 ± 2 on MNIST) compared to GANs, indicating enhanced empirical performance.
Theoretical contributions include deriving generalization bounds based on fat-shattering dimensions, underscoring MMD’s scalability for high-dimensional data.

Training Generative Neural Networks via Maximum Mean Discrepancy Optimization

This paper by Dziugaite, Roy, and Ghahramani explores the novel application of Maximum Mean Discrepancy (MMD) as a means of training generative neural networks. The authors propose replacing the adversarial discriminator, typically found in Generative Adversarial Networks (GANs), with a nonparametric two-sample test derived from the MMD kernel method. This work seeks to provide an alternative to the discriminatory network strategy, which can be computationally intensive and susceptible to issues such as network synchronization and overfitting.

Overview and Methodology

The authors frame the learning problem as minimizing a two-sample test statistic, leveraging the properties of MMD to ensure generated samples are indistinguishable from real samples. Unlike the GAN framework, where a generator and discriminator engage in a minimax game, MMD nets use the statistical distance between generated samples and real samples directly, sidestepping the need for a discriminative network.

Key contributions include:

The introduction of the MMD objective as a surrogate for the adversarial game in GANs.
Derivation of optimization algorithms for training the generator network through gradient descent based on MMD.
A demonstration of the empirical performance advantages when using MMD optimization over the adversarial approach in terms of computational efficiency and test density estimates.

Empirical Results

The authors present empirical evaluations on standard datasets such as MNIST and the Toronto Face Dataset, showcasing the efficacy of MMD nets. They highlight that, despite the artifacts in generated samples, the networks achieve superior mean log density scores on held-out test sets compared to adversarial networks. Specifically, on the MNIST dataset, MMD nets achieved a mean log density of 315 ± 2, surpassing GANs' reported 225 ± 2.

Theoretical Contributions

The paper explores the theoretical underpinnings of the method, offering generalization bounds for models trained using MMD. By providing bounds on estimation error, the authors illustrate the capability of MMD nets to reliably approximate the population MMD with empirical estimates. These bounds, derived in the context of fat-shattering dimensions of function classes, deepen the understanding of MMD's applicability in high-dimensional settings.

Practical and Theoretical Implications

By replacing the discriminator with a computationally tractable, closed-form statistic, MMD nets hold potential for accelerated training and potentially more stable convergence properties. The theoretical foundations presented suggest promising scalability for high-dimensional data generation, expanding the frontier for applications in generative modeling.

Future directions for this research might include:

Exploration of different kernel functions and their impact on MMD's performance in model training.
Examination of the method's robustness across diverse datasets, particularly those outside the visual domain.
Integration of domain-specific kernels to enhance invariance properties in generated data, providing potentially more realistic outputs.

In conclusion, this work positions MMD optimization as a compelling alternative to adversarial methods within the landscape of generative modeling. The theoretical exploration and empirical validation give credence to its potential, prompting further studies into its applications and optimization strategies in the broader AI domain.

PDF Markdown