Picture for Thomas Wolf

Thomas Wolf

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Add code
Jun 25, 2024
Viaarxiv icon

StarCoder 2 and The Stack v2: The Next Generation

Add code
Feb 29, 2024
Figure 1 for StarCoder 2 and The Stack v2: The Next Generation
Figure 2 for StarCoder 2 and The Stack v2: The Next Generation
Figure 3 for StarCoder 2 and The Stack v2: The Next Generation
Figure 4 for StarCoder 2 and The Stack v2: The Next Generation
Viaarxiv icon

GAIA: a benchmark for General AI Assistants

Add code
Nov 21, 2023
Viaarxiv icon

FinGPT: Large Generative Models for a Small Language

Add code
Nov 03, 2023
Viaarxiv icon

Zephyr: Direct Distillation of LM Alignment

Add code
Oct 25, 2023
Viaarxiv icon

Scaling Data-Constrained Language Models

Add code
May 25, 2023
Viaarxiv icon

StarCoder: may the source be with you!

Add code
May 09, 2023
Viaarxiv icon

Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning

Add code
Feb 06, 2023
Viaarxiv icon

The Stack: 3 TB of permissively licensed source code

Add code
Nov 20, 2022
Viaarxiv icon

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Add code
Nov 09, 2022
Viaarxiv icon