Picture for Shunji Wan

Shunji Wan

VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation

Add code
Jun 26, 2024
Viaarxiv icon