Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gulnara Kabaeva

HJ-Ky-0.1: an Evaluation Dataset for Kyrgyz Word Embeddings

Nov 16, 2024

Anton Alekseev, Gulnara Kabaeva

Figure 1 for HJ-Ky-0.1: an Evaluation Dataset for Kyrgyz Word Embeddings

Figure 2 for HJ-Ky-0.1: an Evaluation Dataset for Kyrgyz Word Embeddings

Abstract:One of the key tasks in modern applied computational linguistics is constructing word vector representations (word embeddings), which are widely used to address natural language processing tasks such as sentiment analysis, information extraction, and more. To choose an appropriate method for generating these word embeddings, quality assessment techniques are often necessary. A standard approach involves calculating distances between vectors for words with expert-assessed 'similarity'. This work introduces the first 'silver standard' dataset for such tasks in the Kyrgyz language, alongside training corresponding models and validating the dataset's suitability through quality evaluation metrics.

* Herald of KSTU 68(4) (2023)

Via

Access Paper or Ask Questions

Benchmarking Multilabel Topic Classification in the Kyrgyz Language

Aug 30, 2023

Anton Alekseev, Sergey I. Nikolenko, Gulnara Kabaeva

Abstract:Kyrgyz is a very underrepresented language in terms of modern natural language processing resources. In this work, we present a new public benchmark for topic classification in Kyrgyz, introducing a dataset based on collected and annotated data from the news site 24.KG and presenting several baseline models for news classification in the multilabel setting. We train and evaluate both classical statistical and neural models, reporting the scores, discussing the results, and proposing directions for future work.

* Accepted to AIST 2023

Via

Access Paper or Ask Questions