Task Loss Estimation for Sequence Prediction (1511.06456v4)

Published 19 Nov 2015 in cs.LG

Abstract: Often, the performance on a supervised machine learning task is evaluated with a emph{task loss} function that cannot be optimized directly. Examples of such loss functions include the classification error, the edit distance and the BLEU score. A common workaround for this problem is to instead optimize a emph{surrogate loss} function, such as for instance cross-entropy or hinge loss. In order for this remedy to be effective, it is important to ensure that minimization of the surrogate loss results in minimization of the task loss, a condition that we call emph{consistency with the task loss}. In this work, we propose another method for deriving differentiable surrogate losses that provably meet this requirement. We focus on the broad class of models that define a score for every input-output pair. Our idea is that this score can be interpreted as an estimate of the task loss, and that the estimation error may be used as a consistent surrogate loss. A distinct feature of such an approach is that it defines the desirable value of the score for every input-output pair. We use this property to design specialized surrogate losses for Encoder-Decoder models often used for sequence prediction tasks. In our experiment, we benchmark on the task of speech recognition. Using a new surrogate loss instead of cross-entropy to train an Encoder-Decoder speech recognizer brings a significant ~13% relative improvement in terms of Character Error Rate (CER) in the case when no extra corpora are used for LLMing.

Authors (7)

Dzmitry Bahdanau (46 papers)
Dmitriy Serdyuk (20 papers)
Nan Rosemary Ke (40 papers)
Jan Chorowski (29 papers)
Aaron Courville (201 papers)
Yoshua Bengio (601 papers)
Philémon Brakel (3 papers)

Citations (33)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Task Loss Estimation for Sequence Prediction (1511.06456v4)

Summary

Related Papers