Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Consistent Alignment of Word Embedding Models (1702.07680v1)

Published 24 Feb 2017 in cs.CL, cs.IR, and stat.ML

Abstract: Word embedding models offer continuous vector representations that can capture rich contextual semantics based on their word co-occurrence patterns. While these word vectors can provide very effective features used in many NLP tasks such as clustering similar words and inferring learning relationships, many challenges and open research questions remain. In this paper, we propose a solution that aligns variations of the same model (or different models) in a joint low-dimensional latent space leveraging carefully generated synthetic data points. This generative process is inspired by the observation that a variety of linguistic relationships is captured by simple linear operations in embedded space. We demonstrate that our approach can lead to substantial improvements in recovering embeddings of local neighborhoods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Cem Safak Sahin (2 papers)
  2. Rajmonda S. Caceres (10 papers)
  3. Brandon Oselio (9 papers)
  4. William M. Campbell (14 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.