Picture for Kejuan Yang

Kejuan Yang

SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories

Add code
Sep 11, 2024
Viaarxiv icon

AgentBench: Evaluating LLMs as Agents

Add code
Aug 07, 2023
Viaarxiv icon

Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration

Add code
May 24, 2023
Viaarxiv icon