Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Rare Word Representations using Semantic Bridging (1707.07554v1)

Published 24 Jul 2017 in cs.CL and cs.AI

Abstract: We propose a methodology that adapts graph embedding techniques (DeepWalk (Perozzi et al., 2014) and node2vec (Grover and Leskovec, 2016)) as well as cross-lingual vector space mapping approaches (Least Squares and Canonical Correlation Analysis) in order to merge the corpus and ontological sources of lexical knowledge. We also perform comparative analysis of the used algorithms in order to identify the best combination for the proposed system. We then apply this to the task of enhancing the coverage of an existing word embedding's vocabulary with rare and unseen words. We show that our technique can provide considerable extra coverage (over 99%), leading to consistent performance gain (around 10% absolute gain is achieved with w2v-gn-500K cf.\S 3.3) on the Rare Word Similarity dataset.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Victor Prokhorov (9 papers)
  2. Mohammad Taher Pilehvar (43 papers)
  3. Dimitri Kartsaklis (24 papers)
  4. Nigel Collier (83 papers)
  5. Pietro Lió (16 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.