- The paper challenges scheduled sampling by demonstrating its inconsistency through theoretical analysis of KL divergence.
- It advocates shifting from traditional maximum likelihood to reverse KL minimization for better perceptual sample quality.
- Introducing a generalized adversarial training framework, the paper offers a promising alternative for generating realistic output.
An Examination of Objective Functions for Generative Model Training
The paper "How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?" by Ferenc Husz challenges the efficacy of current training methodologies for generative models. It offers a critical assessment of scheduled sampling and explores alternative objective functions that better align with the goals of generating realistic samples. This essay provides a technical overview of the paper's findings and implications.
Critique of Scheduled Sampling
Scheduled sampling is scrutinized for its theoretical underpinning and practical effectiveness. The methodology, celebrated for its success in tasks like image captioning, fundamentally exhibits flaws that impair its reliability as a training strategy. The paper argues that scheduled sampling leads to an inconsistent learning algorithm due to its improper objective function, essentially highlighting that the algorithm does not adhere to strictly proper scoring rules, thus compromising consistency and optimal model performance at convergence.
Husz reinterprets the scheduled sampling's training objective through the lens of Kullback-Leibler (KL) divergence and demonstrates the divergence's contribution to suboptimal outcomes. By analytically unpacking scheduled sampling, the paper reveals the tendency of models trained via this method to degenerate into trivial solutions, thereby calling for cautious application.
Addressing Training Objectives
The paper explores the foundational issues intended to be tackled by scheduled sampling, asserting that maximum likelihood is misaligned with the objective of generating perceptually pleasing samples. The work introduces a divergence from traditional objectives by advancing a shift in focus towards minimising the KL divergence in the reverse direction, KL[Q∥P], as a more appropriate measure when generating naturalistic outputs.
The theoretical underpinning suggests that the minimization of KL[P∥Q], typical in maximum likelihood estimations, frequently results in models that overgeneralize, yielding samples that deviate substantially from authentic distributions. Conversely, adopting KL[Q∥P] promotes a mode-seeking behaviour that better captures the perceptual quality sought in generative outputs.
Proposal of Generalised Adversarial Training
To address the challenges of implementing KL[Q∥P] directly, Husz proposes a generalised adversarial training framework. This approach leverages a generalised form of Jensen-Shannon (JS) divergence, wherein the parameter π is used to interpolate between the properties of KL[P∥Q] and KL[Q∥P]. This innovative approach underpins the success of adversarial training methodologies, providing theoretical justification for its observed superiority in yielding high-quality generative samples.
Implications and Future Directions
The implications of this paper are significant for the domain of generative modeling, both theoretically and practically. The insights offered suggest a pivot from traditional maximum likelihood training towards methodologies that cater more directly to the perceptual properties desired in sample generation. The introduction of a flexible adversarial training mechanism broadens the horizons for future research, potentially impacting a wide array of applications from natural language processing to image synthesis.
Although adversarial training stands as a promising candidate, the work acknowledges current constraints related to high-dimensional sampling inefficiencies and application to discrete models. Thus, future research is prompted to address these limitations, improving the feasibility and robustness of adversarial approaches in generative contexts.
In conclusion, this paper contributes a substantial theoretical grounding that motivates the exploration of novel training objectives for generative models, setting the stage for continued advancement and refinement of methodologies in the field.