Picture for Anca Dragan

Anca Dragan

Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning

Add code
Nov 07, 2024
Viaarxiv icon

Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations

Add code
Nov 07, 2024
Viaarxiv icon

Learning to Assist Humans without Inferring Rewards

Add code
Nov 04, 2024
Viaarxiv icon

Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback

Add code
Nov 04, 2024
Viaarxiv icon

Trajectory Improvement and Reward Learning from Comparative Language Feedback

Add code
Oct 08, 2024
Figure 1 for Trajectory Improvement and Reward Learning from Comparative Language Feedback
Figure 2 for Trajectory Improvement and Reward Learning from Comparative Language Feedback
Figure 3 for Trajectory Improvement and Reward Learning from Comparative Language Feedback
Figure 4 for Trajectory Improvement and Reward Learning from Comparative Language Feedback
Viaarxiv icon

Imagen 3

Add code
Aug 13, 2024
Viaarxiv icon

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

Add code
Aug 09, 2024
Figure 1 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Figure 2 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Figure 3 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Figure 4 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Viaarxiv icon

Gemma 2: Improving Open Language Models at a Practical Size

Add code
Aug 02, 2024
Figure 1 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 2 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 3 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 4 for Gemma 2: Improving Open Language Models at a Practical Size
Viaarxiv icon

Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making

Add code
Jun 24, 2024
Viaarxiv icon

Adversaries Can Misuse Combinations of Safe Models

Add code
Jun 20, 2024
Viaarxiv icon