Picture for Arjun Panickssery

Arjun Panickssery

Analyzing Probabilistic Methods for Evaluating Agent Capabilities

Add code
Sep 24, 2024
Viaarxiv icon

Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs

Add code
Jul 04, 2024
Viaarxiv icon

LLM Evaluators Recognize and Favor Their Own Generations

Add code
Apr 15, 2024
Figure 1 for LLM Evaluators Recognize and Favor Their Own Generations
Figure 2 for LLM Evaluators Recognize and Favor Their Own Generations
Figure 3 for LLM Evaluators Recognize and Favor Their Own Generations
Figure 4 for LLM Evaluators Recognize and Favor Their Own Generations
Viaarxiv icon

REBUS: A Robust Evaluation Benchmark of Understanding Symbols

Add code
Jan 11, 2024
Viaarxiv icon