Reward Augmented Maximum Likelihood for Neural Structured Prediction (1609.00150v3)

Published 1 Sep 2016 in cs.LG

Abstract: A key problem in structured output prediction is direct optimization of the task reward function that matters for test evaluation. This paper presents a simple and computationally efficient approach to incorporate task reward into a maximum likelihood framework. By establishing a link between the log-likelihood and expected reward objectives, we show that an optimal regularized expected reward is achieved when the conditional distribution of the outputs given the inputs is proportional to their exponentiated scaled rewards. Accordingly, we present a framework to smooth the predictive probability of the outputs using their corresponding rewards. We optimize the conditional log-probability of augmented outputs that are sampled proportionally to their exponentiated scaled rewards. Experiments on neural sequence to sequence models for speech recognition and machine translation show notable improvements over a maximum likelihood baseline by using reward augmented maximum likelihood (RAML), where the rewards are defined as the negative edit distance between the outputs and the ground truth labels.

Authors (7)

Mohammad Norouzi (81 papers)
Samy Bengio (75 papers)
Zhifeng Chen (65 papers)
Navdeep Jaitly (67 papers)
Mike Schuster (9 papers)
Yonghui Wu (115 papers)
Dale Schuurmans (112 papers)

Citations (248)

View on Semantic Scholar

Summary

Reward Augmented Maximum Likelihood for Neural Structured Prediction

The paper, "Reward Augmented Maximum Likelihood for Neural Structured Prediction," introduces a novel method aimed at improving the training of neural networks in structured prediction tasks by directly incorporating task-specific reward metrics. This methodology, Reward Augmented Maximum Likelihood (RAML), operates within a maximum likelihood framework, augmented by a step that samples outputs based on their scaled rewards. This approach aims to reconcile the computational efficiency of maximum likelihood estimation (MLE) with the more challenging objective of expected task reward maximization.

Motivation and Methodology

Structured prediction, a core domain in machine learning, involves predicting interdependent output structures, such as sequences in NLP and speech recognition. Traditional ML training, as employed in sequence-to-sequence models through recurrent neural networks (RNNs) with LSTM cells, focuses on maximizing the log-likelihood of ground truth outputs given the input data. However, these methods do not directly account for task-specific rewards, like BLEU score in machine translation or word error rate in speech recognition, which are crucial for performance evaluation. In contrast, reinforcement learning techniques, while optimizing expected task rewards, suffer from high variance and inefficiency due to the non-differentiable nature of reward functions.

RAML mitigates these issues by proposing a hybrid approach. It operates by sampling auxiliary outputs for training based on their similarity to ground truth, controlled by a temperature parameter. This sampling is weighted by the exponentiated task reward, thus smoothly integrating task-specific reward information into the learning process. The methodology efficiently bridges the gap between the MLE objectives and reward-based optimization by optimizing a KL divergence between exponentiated reward distributions and model distributions, albeit in opposing directions.

Theoretical Insights

The paper provides a theoretical foundation for RAML, showing that the approach effectively minimizes a KL divergence, thereby optimizing direct links between the expected reward and likelihood. Remarkably, even as RAML optimizes a surrogate criterion, it ensures that differences in reward distributions are taken into account, especially at non-zero temperatures where gaps are expressed as variances in interpolating distributions. This connection highlights RAML's capability to reflect nuances in reward distributions that traditional likelihood-based models overlook.

Empirical Results

The empirical evaluations conducted on tasks including machine translation (WMT'14 English to French) and speech recognition (TIMIT dataset) demonstrate the efficacy of RAML. On machine translation tasks, RAML shows notable improvements over state-of-the-art maximum likelihood models, achieving a higher BLEU score with models trained under this framework. Similarly, in speech recognition, the approach exhibits significant reductions in phone error rates (PER) compared to traditional methods. These results underscore RAML's potential in not only maintaining but surpassing performance standards set by existing sequence-to-sequence models, independent of the additional computational burdens of sampling during training.

Implications and Future Directions

RAML holds substantial implications for advancing structured prediction models in artificial intelligence. Practically, it provides a viable pathway to enhance the performance of neural networks on structured output tasks by effectively integrating reward objectives into the learning process. The theoretical implications suggest a deeper understanding of how reward and likelihood objectives can coexist within a learning framework. Future directions may explore extending RAML to more complex probabilistic models and broader contexts, where task-specific reward metrics necessitate adaptive, reward-sensitive training algorithms.

In conclusion, Reward Augmented Maximum Likelihood introduces a significant advancement in aligning structured prediction models closer to their evaluation metrics, thus promising a robust avenue for future explorations in reward-centric learning paradigms within AI.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/norpadon/status/1808348742568861836