Picture for Leshem Choshen

Leshem Choshen

DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation

Add code
Mar 04, 2025
Viaarxiv icon

The Mighty ToRR: A Benchmark for Table Reasoning and Robustness

Add code
Feb 26, 2025
Viaarxiv icon

Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families

Add code
Dec 09, 2024
Viaarxiv icon

Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

Add code
Dec 06, 2024
Viaarxiv icon

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Add code
Dec 04, 2024
Figure 1 for Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Figure 2 for Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Figure 3 for Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Figure 4 for Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Viaarxiv icon

ZipNN: Lossless Compression for AI Models

Add code
Nov 07, 2024
Figure 1 for ZipNN: Lossless Compression for AI Models
Figure 2 for ZipNN: Lossless Compression for AI Models
Figure 3 for ZipNN: Lossless Compression for AI Models
Figure 4 for ZipNN: Lossless Compression for AI Models
Viaarxiv icon

Model merging with SVD to tie the Knots

Add code
Oct 25, 2024
Figure 1 for Model merging with SVD to tie the Knots
Figure 2 for Model merging with SVD to tie the Knots
Figure 3 for Model merging with SVD to tie the Knots
Figure 4 for Model merging with SVD to tie the Knots
Viaarxiv icon

A Hitchhiker's Guide to Scaling Law Estimation

Add code
Oct 15, 2024
Viaarxiv icon

Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity

Add code
Aug 22, 2024
Figure 1 for Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity
Figure 2 for Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity
Figure 3 for Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity
Figure 4 for Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity
Viaarxiv icon

Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs

Add code
Aug 20, 2024
Figure 1 for Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs
Figure 2 for Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs
Figure 3 for Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs
Figure 4 for Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs
Viaarxiv icon