Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RuDSI: graph-based word sense induction dataset for Russian (2209.13750v1)

Published 28 Sep 2022 in cs.CL

Abstract: We present RuDSI, a new benchmark for word sense induction (WSI) in Russian. The dataset was created using manual annotation and semi-automatic clustering of Word Usage Graphs (WUGs). Unlike prior WSI datasets for Russian, RuDSI is completely data-driven (based on texts from Russian National Corpus), with no external word senses imposed on annotators. Depending on the parameters of graph clustering, different derivative datasets can be produced from raw annotation. We report the performance that several baseline WSI methods obtain on RuDSI and discuss possibilities for improving these scores.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Anna Aksenova (1 paper)
  2. Ekaterina Gavrishina (1 paper)
  3. Elisey Rykov (1 paper)
  4. Andrey Kutuzov (41 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.