Picture for Asaf Yehudai

Asaf Yehudai

Survey on Evaluation of LLM-based Agents

Add code
Mar 20, 2025
Viaarxiv icon

WildIFEval: Instruction Following in the Wild

Add code
Mar 09, 2025
Viaarxiv icon

The Mighty ToRR: A Benchmark for Table Reasoning and Robustness

Add code
Feb 26, 2025
Viaarxiv icon

Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models

Add code
Feb 12, 2025
Viaarxiv icon

JuStRank: Benchmarking LLM Judges for System Ranking

Add code
Dec 12, 2024
Viaarxiv icon

Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models

Add code
Sep 07, 2024
Figure 1 for Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Figure 2 for Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Figure 3 for Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Figure 4 for Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Viaarxiv icon

Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation

Add code
Jul 18, 2024
Viaarxiv icon

Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation

Add code
Jun 02, 2024
Figure 1 for Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
Figure 2 for Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
Figure 3 for Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
Figure 4 for Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
Viaarxiv icon

A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns

Add code
May 23, 2024
Viaarxiv icon

When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes

Add code
Apr 18, 2024
Viaarxiv icon