Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Modeling Target-Side Inflection in Neural Machine Translation (1707.06012v2)

Published 19 Jul 2017 in cs.CL

Abstract: NMT systems have problems with large vocabulary sizes. Byte-pair encoding (BPE) is a popular approach to solving this problem, but while BPE allows the system to generate any target-side word, it does not enable effective generalization over the rich vocabulary in morphologically rich languages with strong inflectional phenomena. We introduce a simple approach to overcome this problem by training a system to produce the lemma of a word and its morphologically rich POS tag, which is then followed by a deterministic generation step. We apply this strategy for English-Czech and English-German translation scenarios, obtaining improvements in both settings. We furthermore show that the improvement is not due to only adding explicit morphological information.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Aleš Tamchyna (3 papers)
  2. Marion Weller-Di Marco (2 papers)
  3. Alexander Fraser (50 papers)
Citations (45)

Summary

We haven't generated a summary for this paper yet.