Picture for Isaac Caswell

Isaac Caswell

Separating the Wheat from the Chaff with BREAD: An open-source benchmark and metrics to detect redundancy in text

Add code
Nov 11, 2023
Viaarxiv icon

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Add code
Sep 09, 2023
Viaarxiv icon

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

Add code
May 24, 2023
Figure 1 for XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Figure 2 for XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Figure 3 for XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Figure 4 for XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Viaarxiv icon

Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation

Add code
Mar 27, 2023
Viaarxiv icon

Building Machine Translation Systems for the Next Thousand Languages

Add code
May 16, 2022
Figure 1 for Building Machine Translation Systems for the Next Thousand Languages
Figure 2 for Building Machine Translation Systems for the Next Thousand Languages
Figure 3 for Building Machine Translation Systems for the Next Thousand Languages
Figure 4 for Building Machine Translation Systems for the Next Thousand Languages
Viaarxiv icon

Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning

Add code
Jan 13, 2022
Figure 1 for Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning
Figure 2 for Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning
Figure 3 for Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning
Figure 4 for Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning
Viaarxiv icon

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Add code
Mar 22, 2021
Figure 1 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Figure 2 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Figure 3 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Figure 4 for Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Viaarxiv icon

Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus

Add code
Oct 29, 2020
Figure 1 for Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Figure 2 for Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Figure 3 for Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Figure 4 for Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Viaarxiv icon

BLEU might be Guilty but References are not Innocent

Add code
Apr 13, 2020
Figure 1 for BLEU might be Guilty but References are not Innocent
Figure 2 for BLEU might be Guilty but References are not Innocent
Figure 3 for BLEU might be Guilty but References are not Innocent
Figure 4 for BLEU might be Guilty but References are not Innocent
Viaarxiv icon

Translationese as a Language in "Multilingual" NMT

Add code
Nov 10, 2019
Figure 1 for Translationese as a Language in "Multilingual" NMT
Figure 2 for Translationese as a Language in "Multilingual" NMT
Figure 3 for Translationese as a Language in "Multilingual" NMT
Figure 4 for Translationese as a Language in "Multilingual" NMT
Viaarxiv icon