Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Unsupervised Word-by-Word Translation with Language Model and Denoising Autoencoder (1901.01590v1)

Published 6 Jan 2019 in cs.CL and cs.LG

Abstract: Unsupervised learning of cross-lingual word embedding offers elegant matching of words across languages, but has fundamental limitations in translating sentences. In this paper, we propose simple yet effective methods to improve word-by-word translation of cross-lingual embeddings, using only monolingual corpora but without any back-translation. We integrate a LLM for context-aware search, and use a novel denoising autoencoder to handle reordering. Our system surpasses state-of-the-art unsupervised neural translation systems without costly iterative training. We also analyze the effect of vocabulary size and denoising type on the translation performance, which provides better understanding of learning the cross-lingual word embedding and its usage in translation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yunsu Kim (40 papers)
  2. Jiahui Geng (24 papers)
  3. Hermann Ney (104 papers)
Citations (39)

Summary

We haven't generated a summary for this paper yet.