Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Multilingual Word Representations using a Bag-of-Words Autoencoder (1401.1803v1)

Published 8 Jan 2014 in cs.CL, cs.LG, and stat.ML

Abstract: Recent work on learning multilingual word representations usually relies on the use of word-level alignements (e.g. infered with the help of GIZA++) between translated sentences, in order to align the word embeddings in different languages. In this workshop paper, we investigate an autoencoder model for learning multilingual word representations that does without such word-level alignements. The autoencoder is trained to reconstruct the bag-of-word representation of given sentence from an encoded representation extracted from its translation. We evaluate our approach on a multilingual document classification task, where labeled data is available only for one language (e.g. English) while classification must be performed in a different language (e.g. French). In our experiments, we observe that our method compares favorably with a previously proposed method that exploits word-level alignments to learn word representations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Stanislas Lauly (7 papers)
  2. Alex Boulanger (1 paper)
  3. Hugo Larochelle (87 papers)
Citations (40)

Summary

We haven't generated a summary for this paper yet.