Picture for Thomas Wolf

Thomas Wolf

SmolVLM: Redefining small and efficient multimodal models

Add code
Apr 07, 2025
Viaarxiv icon

YourBench: Easy Custom Evaluation Sets for Everyone

Add code
Apr 02, 2025
Viaarxiv icon

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Add code
Feb 04, 2025
Viaarxiv icon

Towards Best Practices for Open Datasets for LLM Training

Add code
Jan 14, 2025
Viaarxiv icon

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Add code
Jun 25, 2024
Viaarxiv icon

StarCoder 2 and The Stack v2: The Next Generation

Add code
Feb 29, 2024
Figure 1 for StarCoder 2 and The Stack v2: The Next Generation
Figure 2 for StarCoder 2 and The Stack v2: The Next Generation
Figure 3 for StarCoder 2 and The Stack v2: The Next Generation
Figure 4 for StarCoder 2 and The Stack v2: The Next Generation
Viaarxiv icon

GAIA: a benchmark for General AI Assistants

Add code
Nov 21, 2023
Figure 1 for GAIA: a benchmark for General AI Assistants
Figure 2 for GAIA: a benchmark for General AI Assistants
Figure 3 for GAIA: a benchmark for General AI Assistants
Figure 4 for GAIA: a benchmark for General AI Assistants
Viaarxiv icon

FinGPT: Large Generative Models for a Small Language

Add code
Nov 03, 2023
Figure 1 for FinGPT: Large Generative Models for a Small Language
Figure 2 for FinGPT: Large Generative Models for a Small Language
Figure 3 for FinGPT: Large Generative Models for a Small Language
Figure 4 for FinGPT: Large Generative Models for a Small Language
Viaarxiv icon

Zephyr: Direct Distillation of LM Alignment

Add code
Oct 25, 2023
Viaarxiv icon

Scaling Data-Constrained Language Models

Add code
May 25, 2023
Viaarxiv icon