Picture for Zachary S. Siegel

Zachary S. Siegel

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

Add code
Sep 17, 2024
Figure 1 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Figure 2 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Figure 3 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Figure 4 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Viaarxiv icon

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval

Add code
Jul 16, 2024
Viaarxiv icon

AI Agents That Matter

Add code
Jul 01, 2024
Viaarxiv icon

Learning adaptive planning representations with natural language guidance

Add code
Dec 13, 2023
Viaarxiv icon