Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Word Embedding based Edit Distance (1810.10752v1)

Published 25 Oct 2018 in cs.CL

Abstract: Text similarity calculation is a fundamental problem in natural language processing and related fields. In recent years, deep neural networks have been developed to perform the task and high performances have been achieved. The neural networks are usually trained with labeled data in supervised learning, and creation of labeled data is usually very costly. In this short paper, we address unsupervised learning for text similarity calculation. We propose a new method called Word Embedding based Edit Distance (WED), which incorporates word embedding into edit distance. Experiments on three benchmark datasets show WED outperforms state-of-the-art unsupervised methods including edit distance, TF-IDF based cosine, word embedding based cosine, Jaccard index, etc.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yilin Niu (10 papers)
  2. Chao Qiao (3 papers)
  3. Hang Li (277 papers)
  4. Minlie Huang (226 papers)
Citations (4)