Picture for Noah Shinn

Noah Shinn

$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

Add code
Jun 17, 2024
Viaarxiv icon

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

Add code
Dec 29, 2023
Viaarxiv icon

Type Prediction With Program Decomposition and Fill-in-the-Type Training

Add code
May 25, 2023
Viaarxiv icon

Reflexion: an autonomous agent with dynamic memory and self-reflection

Add code
Mar 20, 2023
Viaarxiv icon