Picture for Quentin Lhoest

Quentin Lhoest

Croissant: A Metadata Format for ML-Ready Datasets

Add code
Mar 28, 2024
Viaarxiv icon

AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages

Add code
Apr 04, 2023
Viaarxiv icon

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Add code
Mar 07, 2023
Figure 1 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 2 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 3 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 4 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Viaarxiv icon

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Add code
Nov 09, 2022
Viaarxiv icon

Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements

Add code
Oct 06, 2022
Figure 1 for Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Figure 2 for Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Figure 3 for Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements
Viaarxiv icon

Training Transformers Together

Add code
Jul 07, 2022
Figure 1 for Training Transformers Together
Figure 2 for Training Transformers Together
Viaarxiv icon

Datasets: A Community Library for Natural Language Processing

Add code
Sep 07, 2021
Figure 1 for Datasets: A Community Library for Natural Language Processing
Figure 2 for Datasets: A Community Library for Natural Language Processing
Figure 3 for Datasets: A Community Library for Natural Language Processing
Viaarxiv icon

Distributed Deep Learning in Open Collaborations

Add code
Jun 18, 2021
Figure 1 for Distributed Deep Learning in Open Collaborations
Figure 2 for Distributed Deep Learning in Open Collaborations
Figure 3 for Distributed Deep Learning in Open Collaborations
Figure 4 for Distributed Deep Learning in Open Collaborations
Viaarxiv icon