Picture for Matthew Rahtz

Matthew Rahtz

Imagen 3

Add code
Aug 13, 2024
Viaarxiv icon

Gemma 2: Improving Open Language Models at a Practical Size

Add code
Aug 02, 2024
Figure 1 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 2 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 3 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 4 for Gemma 2: Improving Open Language Models at a Practical Size
Viaarxiv icon

A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI

Add code
Apr 23, 2024
Viaarxiv icon

Evaluating Frontier Models for Dangerous Capabilities

Add code
Mar 20, 2024
Figure 1 for Evaluating Frontier Models for Dangerous Capabilities
Figure 2 for Evaluating Frontier Models for Dangerous Capabilities
Figure 3 for Evaluating Frontier Models for Dangerous Capabilities
Figure 4 for Evaluating Frontier Models for Dangerous Capabilities
Viaarxiv icon

Gemini: A Family of Highly Capable Multimodal Models

Add code
Dec 19, 2023
Viaarxiv icon

The Hydra Effect: Emergent Self-repair in Language Model Computations

Add code
Jul 28, 2023
Viaarxiv icon

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Add code
Jul 24, 2023
Figure 1 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Figure 2 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Figure 3 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Figure 4 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Viaarxiv icon

Tracr: Compiled Transformers as a Laboratory for Interpretability

Add code
Jan 12, 2023
Viaarxiv icon

Safe Deep RL in 3D Environments using Human Feedback

Add code
Jan 21, 2022
Figure 1 for Safe Deep RL in 3D Environments using Human Feedback
Figure 2 for Safe Deep RL in 3D Environments using Human Feedback
Figure 3 for Safe Deep RL in 3D Environments using Human Feedback
Figure 4 for Safe Deep RL in 3D Environments using Human Feedback
Viaarxiv icon

An Extensible Interactive Interface for Agent Design

Add code
Jun 10, 2019
Figure 1 for An Extensible Interactive Interface for Agent Design
Figure 2 for An Extensible Interactive Interface for Agent Design
Figure 3 for An Extensible Interactive Interface for Agent Design
Figure 4 for An Extensible Interactive Interface for Agent Design
Viaarxiv icon