Picture for Jonathan Uesato

Jonathan Uesato

Tony

OpenAI o1 System Card

Add code
Dec 21, 2024
Viaarxiv icon

Alignment faking in large language models

Add code
Dec 18, 2024
Viaarxiv icon

GPT-4o System Card

Add code
Oct 25, 2024
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

Solving math word problems with process- and outcome-based feedback

Add code
Nov 25, 2022
Figure 1 for Solving math word problems with process- and outcome-based feedback
Figure 2 for Solving math word problems with process- and outcome-based feedback
Figure 3 for Solving math word problems with process- and outcome-based feedback
Figure 4 for Solving math word problems with process- and outcome-based feedback
Viaarxiv icon

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

Add code
Oct 04, 2022
Figure 1 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 2 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 3 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 4 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Viaarxiv icon

Improving alignment of dialogue agents via targeted human judgements

Add code
Sep 28, 2022
Figure 1 for Improving alignment of dialogue agents via targeted human judgements
Figure 2 for Improving alignment of dialogue agents via targeted human judgements
Figure 3 for Improving alignment of dialogue agents via targeted human judgements
Figure 4 for Improving alignment of dialogue agents via targeted human judgements
Viaarxiv icon

Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models

Add code
Jun 16, 2022
Figure 1 for Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Figure 2 for Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Figure 3 for Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Viaarxiv icon

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Add code
Dec 08, 2021
Figure 1 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 2 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 3 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Figure 4 for Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Viaarxiv icon

Ethical and social risks of harm from Language Models

Add code
Dec 08, 2021
Figure 1 for Ethical and social risks of harm from Language Models
Figure 2 for Ethical and social risks of harm from Language Models
Viaarxiv icon