Picture for Hongyi Du

Hongyi Du

MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents

Add code
Mar 03, 2025
Viaarxiv icon

EscapeBench: Pushing Language Models to Think Outside the Box

Add code
Dec 18, 2024
Viaarxiv icon