Learning Rare Word Representations using Semantic Bridging (1707.07554v1)

Published 24 Jul 2017 in cs.CL and cs.AI

Abstract: We propose a methodology that adapts graph embedding techniques (DeepWalk (Perozzi et al., 2014) and node2vec (Grover and Leskovec, 2016)) as well as cross-lingual vector space mapping approaches (Least Squares and Canonical Correlation Analysis) in order to merge the corpus and ontological sources of lexical knowledge. We also perform comparative analysis of the used algorithms in order to identify the best combination for the proposed system. We then apply this to the task of enhancing the coverage of an existing word embedding's vocabulary with rare and unseen words. We show that our technique can provide considerable extra coverage (over 99%), leading to consistent performance gain (around 10% absolute gain is achieved with w2v-gn-500K cf.\S 3.3) on the Rare Word Similarity dataset.

Authors (5)

Victor Prokhorov (9 papers)
Mohammad Taher Pilehvar (43 papers)
Dimitri Kartsaklis (24 papers)
Nigel Collier (83 papers)
Pietro Lió (16 papers)

Citations (5)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Learning Rare Word Representations using Semantic Bridging (1707.07554v1)

Summary

Related Papers