2000 character limit reached
Why Neural Machine Translation Prefers Empty Outputs (2012.13454v1)
Published 24 Dec 2020 in cs.CL
Abstract: We investigate why neural machine translation (NMT) systems assign high probability to empty translations. We find two explanations. First, label smoothing makes correct-length translations less confident, making it easier for the empty translation to finally outscore them. Second, NMT systems use the same, high-frequency EoS word to end all target sentences, regardless of length. This creates an implicit smoothing that increases zero-length translations. Using different EoS types in target sentences of different lengths exposes and eliminates this implicit smoothing.
- Xing Shi (20 papers)
- Yijun Xiao (10 papers)
- Kevin Knight (29 papers)