Picture for Hynek Kydlíček

Hynek Kydlíček

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Add code
Feb 18, 2025
Viaarxiv icon

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Add code
Feb 04, 2025
Viaarxiv icon

Towards Best Practices for Open Datasets for LLM Training

Add code
Jan 14, 2025
Viaarxiv icon

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Add code
Jun 25, 2024
Viaarxiv icon

A Dataset and Strong Baselines for Classification of Czech News Texts

Add code
Jul 20, 2023
Figure 1 for A Dataset and Strong Baselines for Classification of Czech News Texts
Figure 2 for A Dataset and Strong Baselines for Classification of Czech News Texts
Figure 3 for A Dataset and Strong Baselines for Classification of Czech News Texts
Figure 4 for A Dataset and Strong Baselines for Classification of Czech News Texts
Viaarxiv icon