Picture for Hynek Kydlíček

Hynek Kydlíček

Towards Best Practices for Open Datasets for LLM Training

Add code
Jan 14, 2025
Viaarxiv icon

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Add code
Jun 25, 2024
Viaarxiv icon

A Dataset and Strong Baselines for Classification of Czech News Texts

Add code
Jul 20, 2023
Viaarxiv icon