Picture for John Schulman

John Schulman

Tony

Measuring short-form factuality in large language models

Add code
Nov 07, 2024
Figure 1 for Measuring short-form factuality in large language models
Figure 2 for Measuring short-form factuality in large language models
Figure 3 for Measuring short-form factuality in large language models
Figure 4 for Measuring short-form factuality in large language models
Viaarxiv icon

Rule Based Rewards for Language Model Safety

Add code
Nov 02, 2024
Viaarxiv icon

GPT-4o System Card

Add code
Oct 25, 2024
Viaarxiv icon

Let's Verify Step by Step

Add code
May 31, 2023
Figure 1 for Let's Verify Step by Step
Figure 2 for Let's Verify Step by Step
Figure 3 for Let's Verify Step by Step
Figure 4 for Let's Verify Step by Step
Viaarxiv icon

Scaling laws for single-agent reinforcement learning

Add code
Jan 31, 2023
Viaarxiv icon

Scaling Laws for Reward Model Overoptimization

Add code
Oct 19, 2022
Figure 1 for Scaling Laws for Reward Model Overoptimization
Figure 2 for Scaling Laws for Reward Model Overoptimization
Figure 3 for Scaling Laws for Reward Model Overoptimization
Figure 4 for Scaling Laws for Reward Model Overoptimization
Viaarxiv icon

Efficient Training of Language Models to Fill in the Middle

Add code
Jul 28, 2022
Figure 1 for Efficient Training of Language Models to Fill in the Middle
Figure 2 for Efficient Training of Language Models to Fill in the Middle
Figure 3 for Efficient Training of Language Models to Fill in the Middle
Figure 4 for Efficient Training of Language Models to Fill in the Middle
Viaarxiv icon

Training language models to follow instructions with human feedback

Add code
Mar 04, 2022
Figure 1 for Training language models to follow instructions with human feedback
Figure 2 for Training language models to follow instructions with human feedback
Figure 3 for Training language models to follow instructions with human feedback
Figure 4 for Training language models to follow instructions with human feedback
Viaarxiv icon

WebGPT: Browser-assisted question-answering with human feedback

Add code
Dec 17, 2021
Figure 1 for WebGPT: Browser-assisted question-answering with human feedback
Figure 2 for WebGPT: Browser-assisted question-answering with human feedback
Figure 3 for WebGPT: Browser-assisted question-answering with human feedback
Figure 4 for WebGPT: Browser-assisted question-answering with human feedback
Viaarxiv icon

Training Verifiers to Solve Math Word Problems

Add code
Nov 18, 2021
Figure 1 for Training Verifiers to Solve Math Word Problems
Figure 2 for Training Verifiers to Solve Math Word Problems
Figure 3 for Training Verifiers to Solve Math Word Problems
Figure 4 for Training Verifiers to Solve Math Word Problems
Viaarxiv icon