Picture for Sayash Kapoor

Sayash Kapoor

Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers

Add code
Dec 02, 2024
Figure 1 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 2 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 3 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 4 for Inference Scaling fLaws: The Limits of LLM Resampling with Imperfect Verifiers
Viaarxiv icon

The Reality of AI and Biorisk

Add code
Dec 02, 2024
Viaarxiv icon

Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers

Add code
Nov 26, 2024
Figure 1 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 2 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 3 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Figure 4 for Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers
Viaarxiv icon

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

Add code
Sep 17, 2024
Figure 1 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Figure 2 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Figure 3 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Figure 4 for CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark
Viaarxiv icon

The Foundation Model Transparency Index v1.1: May 2024

Add code
Jul 17, 2024
Viaarxiv icon

AI Agents That Matter

Add code
Jul 01, 2024
Viaarxiv icon

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

Add code
Jun 26, 2024
Viaarxiv icon

A Safe Harbor for AI Evaluation and Red Teaming

Add code
Mar 07, 2024
Figure 1 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 2 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 3 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 4 for A Safe Harbor for AI Evaluation and Red Teaming
Viaarxiv icon

On the Societal Impact of Open Foundation Models

Add code
Feb 27, 2024
Figure 1 for On the Societal Impact of Open Foundation Models
Figure 2 for On the Societal Impact of Open Foundation Models
Viaarxiv icon

Foundation Model Transparency Reports

Add code
Feb 26, 2024
Viaarxiv icon