Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Word Mover's Embedding: From Word2Vec to Document Embedding (1811.01713v1)

Published 30 Oct 2018 in cs.CL, cs.AI, cs.LG, and stat.ML

Abstract: While the celebrated Word2Vec technique yields semantically rich representations for individual words, there has been relatively less success in extending to generate unsupervised sentences or documents embeddings. Recent work has demonstrated that a distance measure between documents called \emph{Word Mover's Distance} (WMD) that aligns semantically similar words, yields unprecedented KNN classification accuracy. However, WMD is expensive to compute, and it is hard to extend its use beyond a KNN classifier. In this paper, we propose the \emph{Word Mover's Embedding } (WME), a novel approach to building an unsupervised document (sentence) embedding from pre-trained word embeddings. In our experiments on 9 benchmark text classification datasets and 22 textual similarity tasks, the proposed technique consistently matches or outperforms state-of-the-art techniques, with significantly higher accuracy on problems of short length.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Lingfei Wu (135 papers)
  2. Ian E. H. Yen (8 papers)
  3. Kun Xu (277 papers)
  4. Fangli Xu (17 papers)
  5. Avinash Balakrishnan (7 papers)
  6. Pin-Yu Chen (311 papers)
  7. Pradeep Ravikumar (101 papers)
  8. Michael J. Witbrock (1 paper)
Citations (103)