Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Sequence-to-Sequence Learning via Optimal Transport (1901.06283v1)

Published 18 Jan 2019 in cs.CL

Abstract: Sequence-to-sequence models are commonly trained via maximum likelihood estimation (MLE). However, standard MLE training considers a word-level objective, predicting the next word given the previous ground-truth partial sentence. This procedure focuses on modeling local syntactic patterns, and may fail to capture long-range semantic structure. We present a novel solution to alleviate these issues. Our approach imposes global sequence-level guidance via new supervision based on optimal transport, enabling the overall characterization and preservation of semantic features. We further show that this method can be understood as a Wasserstein gradient flow trying to match our model to the ground truth sequence distribution. Extensive experiments are conducted to validate the utility of the proposed approach, showing consistent improvements over a wide variety of NLP tasks, including machine translation, abstractive text summarization, and image captioning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Liqun Chen (42 papers)
  2. Yizhe Zhang (127 papers)
  3. Ruiyi Zhang (98 papers)
  4. Chenyang Tao (29 papers)
  5. Zhe Gan (135 papers)
  6. Haichao Zhang (40 papers)
  7. Bai Li (33 papers)
  8. Dinghan Shen (34 papers)
  9. Changyou Chen (108 papers)
  10. Lawrence Carin (203 papers)
Citations (88)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com