Papers
Topics
Authors
Recent
2000 character limit reached

How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? (1511.05101v1)

Published 16 Nov 2015 in stat.ML, cs.AI, cs.IT, cs.LG, and math.IT

Abstract: Modern applications and progress in deep learning research have created renewed interest for generative models of text and of images. However, even today it is unclear what objective functions one should use to train and evaluate these models. In this paper we present two contributions. Firstly, we present a critique of scheduled sampling, a state-of-the-art training method that contributed to the winning entry to the MSCOCO image captioning benchmark in 2015. Here we show that despite this impressive empirical performance, the objective function underlying scheduled sampling is improper and leads to an inconsistent learning algorithm. Secondly, we revisit the problems that scheduled sampling was meant to address, and present an alternative interpretation. We argue that maximum likelihood is an inappropriate training objective when the end-goal is to generate natural-looking samples. We go on to derive an ideal objective function to use in this situation instead. We introduce a generalisation of adversarial training, and show how such method can interpolate between maximum likelihood training and our ideal training objective. To our knowledge this is the first theoretical analysis that explains why adversarial training tends to produce samples with higher perceived quality.

Citations (290)

Summary

  • The paper challenges scheduled sampling by demonstrating its inconsistency through theoretical analysis of KL divergence.
  • It advocates shifting from traditional maximum likelihood to reverse KL minimization for better perceptual sample quality.
  • Introducing a generalized adversarial training framework, the paper offers a promising alternative for generating realistic output.

An Examination of Objective Functions for Generative Model Training

The paper "How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?" by Ferenc Husz challenges the efficacy of current training methodologies for generative models. It offers a critical assessment of scheduled sampling and explores alternative objective functions that better align with the goals of generating realistic samples. This essay provides a technical overview of the paper's findings and implications.

Critique of Scheduled Sampling

Scheduled sampling is scrutinized for its theoretical underpinning and practical effectiveness. The methodology, celebrated for its success in tasks like image captioning, fundamentally exhibits flaws that impair its reliability as a training strategy. The paper argues that scheduled sampling leads to an inconsistent learning algorithm due to its improper objective function, essentially highlighting that the algorithm does not adhere to strictly proper scoring rules, thus compromising consistency and optimal model performance at convergence.

Husz reinterprets the scheduled sampling's training objective through the lens of Kullback-Leibler (KL) divergence and demonstrates the divergence's contribution to suboptimal outcomes. By analytically unpacking scheduled sampling, the paper reveals the tendency of models trained via this method to degenerate into trivial solutions, thereby calling for cautious application.

Addressing Training Objectives

The paper explores the foundational issues intended to be tackled by scheduled sampling, asserting that maximum likelihood is misaligned with the objective of generating perceptually pleasing samples. The work introduces a divergence from traditional objectives by advancing a shift in focus towards minimising the KL divergence in the reverse direction, KL[QP]KL[Q\|P], as a more appropriate measure when generating naturalistic outputs.

The theoretical underpinning suggests that the minimization of KL[PQ]KL[P\|Q], typical in maximum likelihood estimations, frequently results in models that overgeneralize, yielding samples that deviate substantially from authentic distributions. Conversely, adopting KL[QP]KL[Q\|P] promotes a mode-seeking behaviour that better captures the perceptual quality sought in generative outputs.

Proposal of Generalised Adversarial Training

To address the challenges of implementing KL[QP]KL[Q\|P] directly, Husz proposes a generalised adversarial training framework. This approach leverages a generalised form of Jensen-Shannon (JS) divergence, wherein the parameter π\pi is used to interpolate between the properties of KL[PQ]KL[P\|Q] and KL[QP]KL[Q\|P]. This innovative approach underpins the success of adversarial training methodologies, providing theoretical justification for its observed superiority in yielding high-quality generative samples.

Implications and Future Directions

The implications of this paper are significant for the domain of generative modeling, both theoretically and practically. The insights offered suggest a pivot from traditional maximum likelihood training towards methodologies that cater more directly to the perceptual properties desired in sample generation. The introduction of a flexible adversarial training mechanism broadens the horizons for future research, potentially impacting a wide array of applications from natural language processing to image synthesis.

Although adversarial training stands as a promising candidate, the work acknowledges current constraints related to high-dimensional sampling inefficiencies and application to discrete models. Thus, future research is prompted to address these limitations, improving the feasibility and robustness of adversarial approaches in generative contexts.

In conclusion, this paper contributes a substantial theoretical grounding that motivates the exploration of novel training objectives for generative models, setting the stage for continued advancement and refinement of methodologies in the field.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 1 tweet with 2 likes about this paper.