Picture for Nikola Ljubešić

Nikola Ljubešić

CLASSLA-Express: a Train of CLARIN.SI Workshops on Language Resources and Tools with Easily Expanding Route

Add code
Dec 02, 2024
Viaarxiv icon

LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification

Add code
Nov 29, 2024
Viaarxiv icon

Multilingual Power and Ideology Identification in the Parliament: a Reference Dataset and Simple Baselines

Add code
May 12, 2024
Viaarxiv icon

Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining

Add code
Apr 08, 2024
Viaarxiv icon

CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation

Add code
Mar 26, 2024
Viaarxiv icon

Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

Add code
Mar 13, 2024
Viaarxiv icon

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Add code
Nov 15, 2023
Figure 1 for Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark
Figure 2 for Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark
Figure 3 for Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark
Figure 4 for Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark
Viaarxiv icon

The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings

Add code
Sep 18, 2023
Viaarxiv icon

CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages

Add code
Aug 11, 2023
Viaarxiv icon

Findings of the VarDial Evaluation Campaign 2023

Add code
May 31, 2023
Viaarxiv icon