Picture for Mikel Artetxe

Mikel Artetxe

WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More Challenging

Add code
Feb 25, 2025
Viaarxiv icon

BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation

Add code
Feb 06, 2025
Figure 1 for BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation
Figure 2 for BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation
Figure 3 for BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation
Figure 4 for BOUQuET: dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation
Viaarxiv icon

Linguini: A benchmark for language-agnostic linguistic reasoning

Add code
Sep 18, 2024
Viaarxiv icon

BertaQA: How Much Do Language Models Know About Local Culture?

Add code
Jun 11, 2024
Figure 1 for BertaQA: How Much Do Language Models Know About Local Culture?
Figure 2 for BertaQA: How Much Do Language Models Know About Local Culture?
Figure 3 for BertaQA: How Much Do Language Models Know About Local Culture?
Figure 4 for BertaQA: How Much Do Language Models Know About Local Culture?
Viaarxiv icon

Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

Add code
May 03, 2024
Figure 1 for Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models
Figure 2 for Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models
Figure 3 for Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models
Figure 4 for Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models
Viaarxiv icon

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Add code
Apr 18, 2024
Figure 1 for Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Figure 2 for Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Figure 3 for Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Figure 4 for Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Viaarxiv icon

Latxa: An Open Language Model and Evaluation Suite for Basque

Add code
Mar 29, 2024
Figure 1 for Latxa: An Open Language Model and Evaluation Suite for Basque
Figure 2 for Latxa: An Open Language Model and Evaluation Suite for Basque
Figure 3 for Latxa: An Open Language Model and Evaluation Suite for Basque
Figure 4 for Latxa: An Open Language Model and Evaluation Suite for Basque
Viaarxiv icon

Gender-specific Machine Translation with Large Language Models

Add code
Sep 06, 2023
Figure 1 for Gender-specific Machine Translation with Large Language Models
Figure 2 for Gender-specific Machine Translation with Large Language Models
Figure 3 for Gender-specific Machine Translation with Large Language Models
Figure 4 for Gender-specific Machine Translation with Large Language Models
Viaarxiv icon

The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

Add code
Aug 31, 2023
Figure 1 for The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
Figure 2 for The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
Figure 3 for The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
Figure 4 for The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
Viaarxiv icon

Evaluation of Faithfulness Using the Longest Supported Subsequence

Add code
Aug 23, 2023
Viaarxiv icon