Maximum-Likelihood Augmented Discrete Generative Adversarial Networks (1702.07983v1)

Published 26 Feb 2017 in cs.AI, cs.CL, and cs.LG

Abstract: Despite the successes in capturing continuous distributions, the application of generative adversarial networks (GANs) to discrete settings, like natural language tasks, is rather restricted. The fundamental reason is the difficulty of back-propagation through discrete random variables combined with the inherent instability of the GAN training objective. To address these problems, we propose Maximum-Likelihood Augmented Discrete Generative Adversarial Networks. Instead of directly optimizing the GAN objective, we derive a novel and low-variance objective using the discriminator's output that follows corresponds to the log-likelihood. Compared with the original, the new objective is proved to be consistent in theory and beneficial in practice. The experimental results on various discrete datasets demonstrate the effectiveness of the proposed approach.

Authors (7)

Tong Che (26 papers)
Yanran Li (32 papers)
Ruixiang Zhang (69 papers)
R Devon Hjelm (32 papers)
Wenjie Li (183 papers)
Yangqiu Song (196 papers)
Yoshua Bengio (601 papers)

Citations (228)

View on Semantic Scholar

Summary

Maximum-Likelihood Augmented Discrete Generative Adversarial Networks: An Expert Overview

The paper, "Maximum-Likelihood Augmented Discrete Generative Adversarial Networks" (MaliGAN), addresses the challenge of applying Generative Adversarial Networks (GANs) to discrete data, such as text, which is inherently problematic due to the difficulty of optimizing discrete random variables with backpropagation. While GANs have shown tremendous success when dealing with continuous data (e.g., images), their efficacy in discrete settings has been limited because the discontinuity inherent in discrete data prevents direct gradient-based optimization. This paper aims to bridge that gap by proposing a novel training scheme that augments the GAN framework with a maximum-likelihood estimation component.

Core Contributions of the Paper

Novel Objective Function for Discrete Data: The authors introduce an alternative training objective that utilizes the discriminator's output to construct a normalized log-likelihood objective. This approach is theoretically consistent and provides practical benefits over traditional GAN objectives, which tend to be unstable when applied directly to discrete data settings. The newly formulated objective stabilizes training by approximating the maximum likelihood on the generated sequences.
Variance Reduction Techniques: MaliGAN incorporates several variance reduction strategies to handle the high variance associated with the discrete nature of the data and the reinforcement learning components used in GAN training. Techniques such as single real data-based renormalization and mixed maximum likelihood expectation (MLE)-reinforced learning training are used to enhance training stability and efficiency.
Theoretical Guarantees: The paper provides theoretical justification for the proposed method, showing that when the discriminator is optimal, the proposed objective approximates the true data distribution. Additionally, the approach is designed to converge under certain conditions, even when the discriminator is not perfectly trained.

Experimental Validation

MaliGAN is evaluated across several tasks, including discrete MNIST generation, poem generation, and sentence-level LLMing on the Penn Treebank dataset. Across these tasks, MaliGAN demonstrates superiority compared to standard maximum likelihood estimation and other GAN-based approaches. For example, in poem generation tasks, MaliGAN achieves higher BLEU-2 scores and lower perplexity, indicating better sequence generation quality.

Implications and Future Prospects

The introduction of MaliGAN has profound implications for discrete sequence generation tasks. The proposed method mitigates issues like exposure bias and loss-evaluation mismatch that are prominent in traditional training methods. With an efficient and stable means to train GANs on discrete data, applications such as LLMing, dialogue generation, and machine translation could witness significant advancements.

As the next steps, the authors suggest applying MaliGAN to larger datasets and exploring its potential in various conditional generation contexts, such as optimized dialogue systems and diverse content generation scenarios. Moreover, the paper hints at the feasibility of adapting MaliGAN for tasks where latent structures in data sequences need efficient exploration and exploitation.

In summary, this paper provides a solid foundation for transferring the success of GANs in continuous domains to the challenging discrete data settings by intelligently integrating maximum likelihood principles, thereby expanding the utility and application of generative models in natural language processing and related fields.

PDF Markdown

Related Papers

Find Related Papers