Picture for Suhas Hariharan

Suhas Hariharan

Rethinking CyberSecEval: An LLM-Aided Approach to Evaluation Critique

Add code
Nov 13, 2024
Viaarxiv icon

GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents

Add code
Jun 07, 2024
Viaarxiv icon