Picture for Angelina McMillan-Major

Angelina McMillan-Major

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Add code
Mar 07, 2023
Figure 1 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 2 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 3 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 4 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Viaarxiv icon

Measuring Data

Add code
Dec 09, 2022
Viaarxiv icon

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Add code
Nov 09, 2022
Viaarxiv icon

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Add code
Jun 24, 2022
Figure 1 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 2 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 3 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Figure 4 for GEMv2: Multilingual NLG Benchmarking in a Single Line of Code
Viaarxiv icon

Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources

Add code
Jan 25, 2022
Figure 1 for Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
Figure 2 for Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
Figure 3 for Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
Figure 4 for Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
Viaarxiv icon

Datasets: A Community Library for Natural Language Processing

Add code
Sep 07, 2021
Figure 1 for Datasets: A Community Library for Natural Language Processing
Figure 2 for Datasets: A Community Library for Natural Language Processing
Figure 3 for Datasets: A Community Library for Natural Language Processing
Viaarxiv icon

Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of the HuggingFace and GEM Data and Model Cards

Add code
Aug 16, 2021
Figure 1 for Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of the HuggingFace and GEM Data and Model Cards
Figure 2 for Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of the HuggingFace and GEM Data and Model Cards
Viaarxiv icon

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Add code
Feb 03, 2021
Figure 1 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Figure 2 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Figure 3 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Figure 4 for The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Viaarxiv icon