Picture for Marzieh Fadaee

Marzieh Fadaee

Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation

Add code
Apr 16, 2025
Viaarxiv icon

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

Add code
Apr 09, 2025
Viaarxiv icon

Command A: An Enterprise-Ready Large Language Model

Add code
Apr 01, 2025
Viaarxiv icon

From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions

Add code
Feb 19, 2025
Viaarxiv icon

Towards Best Practices for Open Datasets for LLM Training

Add code
Jan 14, 2025
Viaarxiv icon

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier

Add code
Dec 05, 2024
Figure 1 for Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier
Figure 2 for Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier
Figure 3 for Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier
Figure 4 for Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier
Viaarxiv icon

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Add code
Dec 04, 2024
Figure 1 for Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Figure 2 for Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Figure 3 for Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Figure 4 for Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Viaarxiv icon

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

Add code
Nov 29, 2024
Figure 1 for INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Figure 2 for INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Figure 3 for INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Figure 4 for INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Viaarxiv icon

M-RewardBench: Evaluating Reward Models in Multilingual Settings

Add code
Oct 20, 2024
Figure 1 for M-RewardBench: Evaluating Reward Models in Multilingual Settings
Figure 2 for M-RewardBench: Evaluating Reward Models in Multilingual Settings
Figure 3 for M-RewardBench: Evaluating Reward Models in Multilingual Settings
Figure 4 for M-RewardBench: Evaluating Reward Models in Multilingual Settings
Viaarxiv icon

Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning

Add code
Oct 14, 2024
Figure 1 for Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
Figure 2 for Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
Figure 3 for Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
Figure 4 for Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
Viaarxiv icon