Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What the Vec? Towards Probabilistically Grounded Embeddings (1805.12164v3)

Published 30 May 2018 in cs.CL, cs.LG, and stat.ML

Abstract: Word2Vec (W2V) and GloVe are popular, fast and efficient word embedding algorithms. Their embeddings are widely used and perform well on a variety of natural language processing tasks. Moreover, W2V has recently been adopted in the field of graph embedding, where it underpins several leading algorithms. However, despite their ubiquity and relatively simple model architecture, a theoretical understanding of what the embedding parameters of W2V and GloVe learn and why that is useful in downstream tasks has been lacking. We show that different interactions between PMI vectors reflect semantic word relationships, such as similarity and paraphrasing, that are encoded in low dimensional word embeddings under a suitable projection, theoretically explaining why embeddings of W2V and GloVe work. As a consequence, we also reveal an interesting mathematical interconnection between the considered semantic relationships themselves.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Carl Allen (16 papers)
  2. Ivana Balažević (15 papers)
  3. Timothy Hospedales (101 papers)
Citations (22)

Summary

We haven't generated a summary for this paper yet.