Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Token-level and sequence-level loss smoothing for RNN language models (1805.05062v1)

Published 14 May 2018 in cs.CL and cs.CV

Abstract: Despite the effectiveness of recurrent neural network LLMs, their maximum likelihood estimation suffers from two limitations. It treats all sentences that do not match the ground truth as equally poor, ignoring the structure of the output space. Second, it suffers from "exposure bias": during training tokens are predicted given ground-truth sequences, while at test time prediction is conditioned on generated output sequences. To overcome these limitations we build upon the recent reward augmented maximum likelihood approach \ie sequence-level smoothing that encourages the model to predict sentences close to the ground truth according to a given performance metric. We extend this approach to token-level loss smoothing, and propose improvements to the sequence-level smoothing approach. Our experiments on two different tasks, image captioning and machine translation, show that token-level and sequence-level loss smoothing are complementary, and significantly improve results.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Maha Elbayad (17 papers)
  2. Laurent Besacier (76 papers)
  3. Jakob Verbeek (59 papers)
Citations (19)

Summary

We haven't generated a summary for this paper yet.