Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Text Generation with Student-Forcing Optimal Transport (2010.05994v1)

Published 12 Oct 2020 in cs.CL and cs.LG

Abstract: Neural LLMs are often trained with maximum likelihood estimation (MLE), where the next word is generated conditioned on the ground-truth word tokens. During testing, however, the model is instead conditioned on previously generated tokens, resulting in what is termed exposure bias. To reduce this gap between training and testing, we propose using optimal transport (OT) to match the sequences generated in these two modes. An extension is further proposed to improve the OT learning, based on the structural and contextual information of the text sequences. The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Guoyin Wang (108 papers)
  2. Chunyuan Li (122 papers)
  3. Jianqiao Li (5 papers)
  4. Hao Fu (82 papers)
  5. Yuh-Chen Lin (2 papers)
  6. Liqun Chen (42 papers)
  7. Yizhe Zhang (127 papers)
  8. Chenyang Tao (29 papers)
  9. Ruiyi Zhang (98 papers)
  10. Wenlin Wang (27 papers)
  11. Dinghan Shen (34 papers)
  12. Qian Yang (146 papers)
  13. Lawrence Carin (203 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.