Picture for Peter Rupnik

Peter Rupnik

Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining

Add code
Apr 08, 2024
Viaarxiv icon

Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

Add code
Mar 13, 2024
Viaarxiv icon

The ParlaSent multilingual training dataset for sentiment identification in parliamentary proceedings

Add code
Sep 18, 2023
Viaarxiv icon

The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia

Add code
Jun 02, 2022
Figure 1 for The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia
Figure 2 for The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia
Figure 3 for The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia
Figure 4 for The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia
Viaarxiv icon

The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild

Add code
Jan 11, 2022
Figure 1 for The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild
Figure 2 for The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild
Figure 3 for The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild
Figure 4 for The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild
Viaarxiv icon