Picture for Sayash Kapoor

Sayash Kapoor

Toward an Evaluation Science for Generative AI Systems

Add code
Mar 07, 2025
Viaarxiv icon

International AI Safety Report

Add code
Jan 29, 2025
Viaarxiv icon

Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers

Add code
Dec 02, 2024
Figure 1 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 2 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 3 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 4 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Viaarxiv icon

The Reality of AI and Biorisk

Add code
Dec 02, 2024
Viaarxiv icon

Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers

Add code
Nov 26, 2024
Figure 1 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 2 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 3 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 4 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Viaarxiv icon

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

Add code
Sep 17, 2024
Figure 1 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Figure 2 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Figure 3 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Figure 4 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Viaarxiv icon

The Foundation Model Transparency Index v1.1: May 2024

Add code
Jul 17, 2024
Viaarxiv icon

AI Agents That Matter

Add code
Jul 01, 2024
Viaarxiv icon

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

Add code
Jun 26, 2024
Figure 1 for The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
Figure 2 for The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
Figure 3 for The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources
Viaarxiv icon

A Safe Harbor for AI Evaluation and Red Teaming

Add code
Mar 07, 2024
Figure 1 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 2 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 3 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 4 for A Safe Harbor for AI Evaluation and Red Teaming
Viaarxiv icon