Picture for Olivia Watkins

Olivia Watkins

Tony

OpenAI o1 System Card

Add code
Dec 21, 2024
Viaarxiv icon

GPT-4o System Card

Add code
Oct 25, 2024
Viaarxiv icon

A StrongREJECT for Empty Jailbreaks

Add code
Feb 15, 2024
Viaarxiv icon

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

Add code
Nov 02, 2023
Viaarxiv icon

Learning to Model the World with Language

Add code
Jul 31, 2023
Viaarxiv icon

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

Add code
May 25, 2023
Viaarxiv icon

Aligning Text-to-Image Models using Human Feedback

Add code
Feb 23, 2023
Viaarxiv icon

Guiding Pretraining in Reinforcement Learning with Large Language Models

Add code
Feb 13, 2023
Viaarxiv icon

Teachable Reinforcement Learning via Advice Distillation

Add code
Mar 19, 2022
Figure 1 for Teachable Reinforcement Learning via Advice Distillation
Figure 2 for Teachable Reinforcement Learning via Advice Distillation
Figure 3 for Teachable Reinforcement Learning via Advice Distillation
Figure 4 for Teachable Reinforcement Learning via Advice Distillation
Viaarxiv icon

Explaining Reinforcement Learning Policies through Counterfactual Trajectories

Add code
Jan 29, 2022
Figure 1 for Explaining Reinforcement Learning Policies through Counterfactual Trajectories
Figure 2 for Explaining Reinforcement Learning Policies through Counterfactual Trajectories
Figure 3 for Explaining Reinforcement Learning Policies through Counterfactual Trajectories
Figure 4 for Explaining Reinforcement Learning Policies through Counterfactual Trajectories
Viaarxiv icon