Picture for François Yvon

François Yvon

TLP

GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages

Add code
Oct 31, 2024
Figure 1 for GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
Figure 2 for GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
Figure 3 for GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
Figure 4 for GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
Viaarxiv icon

MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment

Add code
Oct 08, 2024
Viaarxiv icon

How Transliterations Improve Crosslingual Alignment

Add code
Sep 25, 2024
Figure 1 for How Transliterations Improve Crosslingual Alignment
Figure 2 for How Transliterations Improve Crosslingual Alignment
Figure 3 for How Transliterations Improve Crosslingual Alignment
Figure 4 for How Transliterations Improve Crosslingual Alignment
Viaarxiv icon

Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models

Add code
Sep 11, 2024
Figure 1 for Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models
Figure 2 for Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models
Figure 3 for Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models
Figure 4 for Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models
Viaarxiv icon

MaskLID: Code-Switching Language Identification through Iterative Masking

Add code
Jun 10, 2024
Viaarxiv icon

Optimizing example selection for retrieval-augmented machine translation with translation memories

Add code
May 23, 2024
Figure 1 for Optimizing example selection for retrieval-augmented machine translation with translation memories
Figure 2 for Optimizing example selection for retrieval-augmented machine translation with translation memories
Figure 3 for Optimizing example selection for retrieval-augmented machine translation with translation memories
Figure 4 for Optimizing example selection for retrieval-augmented machine translation with translation memories
Viaarxiv icon

Lessons from the Trenches on Reproducible Evaluation of Language Models

Add code
May 23, 2024
Viaarxiv icon

CroissantLLM: A Truly Bilingual French-English Language Model

Add code
Feb 02, 2024
Figure 1 for CroissantLLM: A Truly Bilingual French-English Language Model
Figure 2 for CroissantLLM: A Truly Bilingual French-English Language Model
Figure 3 for CroissantLLM: A Truly Bilingual French-English Language Model
Figure 4 for CroissantLLM: A Truly Bilingual French-English Language Model
Viaarxiv icon

GlotLID: Language Identification for Low-Resource Languages

Add code
Nov 04, 2023
Figure 1 for GlotLID: Language Identification for Low-Resource Languages
Figure 2 for GlotLID: Language Identification for Low-Resource Languages
Figure 3 for GlotLID: Language Identification for Low-Resource Languages
Figure 4 for GlotLID: Language Identification for Low-Resource Languages
Viaarxiv icon

Structural generalization in COGS: Supertagging is (almost) all you need

Add code
Oct 21, 2023
Viaarxiv icon