Picture for William Saunders

William Saunders

RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts

Add code
Nov 22, 2024
Viaarxiv icon

Transformer Circuit Faithfulness Metrics are not Robust

Add code
Jul 11, 2024
Viaarxiv icon

Self-critiquing models for assisting human evaluators

Add code
Jun 14, 2022
Figure 1 for Self-critiquing models for assisting human evaluators
Figure 2 for Self-critiquing models for assisting human evaluators
Figure 3 for Self-critiquing models for assisting human evaluators
Figure 4 for Self-critiquing models for assisting human evaluators
Viaarxiv icon

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Add code
Jun 10, 2022
Viaarxiv icon

WebGPT: Browser-assisted question-answering with human feedback

Add code
Dec 17, 2021
Figure 1 for WebGPT: Browser-assisted question-answering with human feedback
Figure 2 for WebGPT: Browser-assisted question-answering with human feedback
Figure 3 for WebGPT: Browser-assisted question-answering with human feedback
Figure 4 for WebGPT: Browser-assisted question-answering with human feedback
Viaarxiv icon

Truthful AI: Developing and governing AI that does not lie

Add code
Oct 13, 2021
Figure 1 for Truthful AI: Developing and governing AI that does not lie
Figure 2 for Truthful AI: Developing and governing AI that does not lie
Figure 3 for Truthful AI: Developing and governing AI that does not lie
Figure 4 for Truthful AI: Developing and governing AI that does not lie
Viaarxiv icon

Evaluating Large Language Models Trained on Code

Add code
Jul 14, 2021
Figure 1 for Evaluating Large Language Models Trained on Code
Figure 2 for Evaluating Large Language Models Trained on Code
Figure 3 for Evaluating Large Language Models Trained on Code
Figure 4 for Evaluating Large Language Models Trained on Code
Viaarxiv icon

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

Add code
Jul 17, 2017
Figure 1 for Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
Figure 2 for Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
Figure 3 for Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
Viaarxiv icon