Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Elisey Rykov

RuDSI: graph-based word sense induction dataset for Russian

Sep 28, 2022

Anna Aksenova, Ekaterina Gavrishina, Elisey Rykov, Andrey Kutuzov

Figure 1 for RuDSI: graph-based word sense induction dataset for Russian

Figure 2 for RuDSI: graph-based word sense induction dataset for Russian

Figure 3 for RuDSI: graph-based word sense induction dataset for Russian

Figure 4 for RuDSI: graph-based word sense induction dataset for Russian

Abstract:We present RuDSI, a new benchmark for word sense induction (WSI) in Russian. The dataset was created using manual annotation and semi-automatic clustering of Word Usage Graphs (WUGs). Unlike prior WSI datasets for Russian, RuDSI is completely data-driven (based on texts from Russian National Corpus), with no external word senses imposed on annotators. Depending on the parameters of graph clustering, different derivative datasets can be produced from raw annotation. We report the performance that several baseline WSI methods obtain on RuDSI and discuss possibilities for improving these scores.

* TextGraphs-16 workshop at the CoLING-2022 conference

Via

Access Paper or Ask Questions