Picture for Taja Kuzman

Taja Kuzman

Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining

Add code
Apr 08, 2024
Viaarxiv icon

CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation

Add code
Mar 26, 2024
Viaarxiv icon

Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

Add code
Mar 13, 2024
Viaarxiv icon

ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification

Add code
Mar 08, 2023
Viaarxiv icon

The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild

Add code
Jan 11, 2022
Figure 1 for The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild
Figure 2 for The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild
Figure 3 for The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild
Figure 4 for The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild
Viaarxiv icon