Picture for Manan Dey

Manan Dey

Consent in Crisis: The Rapid Decline of the AI Data Commons

Add code
Jul 24, 2024
Viaarxiv icon

StarCoder 2 and The Stack v2: The Next Generation

Add code
Feb 29, 2024
Figure 1 for StarCoder 2 and The Stack v2: The Next Generation
Figure 2 for StarCoder 2 and The Stack v2: The Next Generation
Figure 3 for StarCoder 2 and The Stack v2: The Next Generation
Figure 4 for StarCoder 2 and The Stack v2: The Next Generation
Viaarxiv icon

StarCoder: may the source be with you!

Add code
May 09, 2023
Viaarxiv icon

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Add code
Mar 07, 2023
Viaarxiv icon

SantaCoder: don't reach for the stars!

Add code
Jan 09, 2023
Viaarxiv icon

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Add code
Nov 09, 2022
Viaarxiv icon

How sensitive are translation systems to extra contexts? Mitigating gender bias in Neural Machine Translation models through relevant contexts

Add code
May 22, 2022
Figure 1 for How sensitive are translation systems to extra contexts? Mitigating gender bias in Neural Machine Translation models through relevant contexts
Figure 2 for How sensitive are translation systems to extra contexts? Mitigating gender bias in Neural Machine Translation models through relevant contexts
Figure 3 for How sensitive are translation systems to extra contexts? Mitigating gender bias in Neural Machine Translation models through relevant contexts
Figure 4 for How sensitive are translation systems to extra contexts? Mitigating gender bias in Neural Machine Translation models through relevant contexts
Viaarxiv icon

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

Add code
Feb 02, 2022
Figure 1 for PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Figure 2 for PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Figure 3 for PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Figure 4 for PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Viaarxiv icon

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

Add code
Dec 20, 2021
Figure 1 for Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
Viaarxiv icon

Multitask Prompted Training Enables Zero-Shot Task Generalization

Add code
Oct 15, 2021
Figure 1 for Multitask Prompted Training Enables Zero-Shot Task Generalization
Figure 2 for Multitask Prompted Training Enables Zero-Shot Task Generalization
Figure 3 for Multitask Prompted Training Enables Zero-Shot Task Generalization
Figure 4 for Multitask Prompted Training Enables Zero-Shot Task Generalization
Viaarxiv icon