Picture for Guilherme Penedo

Guilherme Penedo

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Add code
Jun 25, 2024
Viaarxiv icon

The Falcon Series of Open Language Models

Add code
Nov 29, 2023
Viaarxiv icon

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Add code
Jun 01, 2023
Viaarxiv icon