Word-Graph2vec: An efficient word embedding approach on word co-occurrence graph using random walk technique

Published 11 Jan 2023 in cs.CL | (2301.04312v6)

Abstract: Word embedding has become ubiquitous and is widely used in various NLP tasks, such as web retrieval, web semantic analysis, and machine translation, and so on. Unfortunately, training the word embedding in a relatively large corpus is prohibitively expensive. We propose a graph-based word embedding algorithm, called Word-Graph2vec, which converts the large corpus into a word co-occurrence graph, then takes the word sequence samples from this graph by randomly traveling and trains the word embedding on this sampling corpus in the end. We posit that because of the limited vocabulary, huge idioms, and fixed expressions in English, the size and density of the word co-occurrence graph change slightly with the increase in the training corpus. So that Word-Graph2vec has stable runtime on the large-scale data set, and its performance advantage becomes more and more obvious with the growth of the training corpus. Extensive experiments conducted on real-world datasets show that the proposed algorithm outperforms traditional Word2vec four to five times in terms of efficiency and two to three times than FastText, while the error generated by the random walk technique is small.

Abstract PDF HTML Upgrade to Chat

References (1)

MultiMedia, L.: Large text compression benchmark (2009)

Citations (1)

View on Semantic Scholar

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Word-Graph2vec: An efficient word embedding approach on word co-occurrence graph using random walk technique

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (7)

Collections

Word-Graph2vec: An efficient word embedding approach on word co-occurrence graph using random walk technique

Summary

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (7)

Collections