Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Word Sense Disambiguation for 158 Languages using Word Embeddings Only (2003.06651v1)

Published 14 Mar 2020 in cs.CL

Abstract: Disambiguation of word senses in context is easy for humans, but is a major challenge for automatic approaches. Sophisticated supervised and knowledge-based models were developed to solve this task. However, (i) the inherent Zipfian distribution of supervised training instances for a given word and/or (ii) the quality of linguistic knowledge representations motivate the development of completely unsupervised and knowledge-free approaches to word sense disambiguation (WSD). They are particularly useful for under-resourced languages which do not have any resources for building either supervised and/or knowledge-based models. In this paper, we present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory, which can be used for disambiguation in context. We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages. Models and system are available online.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Varvara Logacheva (11 papers)
  2. Denis Teslenko (2 papers)
  3. Artem Shelmanov (29 papers)
  4. Steffen Remus (3 papers)
  5. Dmitry Ustalov (22 papers)
  6. Andrey Kutuzov (41 papers)
  7. Ekaterina Artemova (53 papers)
  8. Chris Biemann (78 papers)
  9. Simone Paolo Ponzetto (52 papers)
  10. Alexander Panchenko (92 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.