Picture for Yotam Perlitz

Yotam Perlitz

DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation

Add code
Mar 04, 2025
Viaarxiv icon

The Mighty ToRR: A Benchmark for Table Reasoning and Robustness

Add code
Feb 26, 2025
Viaarxiv icon

JuStRank: Benchmarking LLM Judges for System Ranking

Add code
Dec 12, 2024
Viaarxiv icon

Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity

Add code
Aug 22, 2024
Figure 1 for Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity
Figure 2 for Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity
Figure 3 for Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity
Figure 4 for Can You Trust Your Metric? Automatic Concatenation-Based Tests for Metric Validity
Viaarxiv icon

Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation

Add code
Jul 18, 2024
Viaarxiv icon

Holmes: Benchmark the Linguistic Competence of Language Models

Add code
Apr 29, 2024
Viaarxiv icon

Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI

Add code
Jan 25, 2024
Figure 1 for Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI
Figure 2 for Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI
Figure 3 for Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI
Figure 4 for Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI
Viaarxiv icon

Efficient Benchmarking (of Language Models)

Add code
Aug 31, 2023
Figure 1 for Efficient Benchmarking (of Language Models)
Figure 2 for Efficient Benchmarking (of Language Models)
Figure 3 for Efficient Benchmarking (of Language Models)
Figure 4 for Efficient Benchmarking (of Language Models)
Viaarxiv icon

Active Learning for Natural Language Generation

Add code
May 24, 2023
Figure 1 for Active Learning for Natural Language Generation
Figure 2 for Active Learning for Natural Language Generation
Figure 3 for Active Learning for Natural Language Generation
Figure 4 for Active Learning for Natural Language Generation
Viaarxiv icon

nBIIG: A Neural BI Insights Generation System for Table Reporting

Add code
Nov 08, 2022
Figure 1 for nBIIG: A Neural BI Insights Generation System for Table Reporting
Figure 2 for nBIIG: A Neural BI Insights Generation System for Table Reporting
Figure 3 for nBIIG: A Neural BI Insights Generation System for Table Reporting
Viaarxiv icon