Picture for Kyle O'Brien

Kyle O'Brien

Steering Language Model Refusal with Sparse Autoencoders

Add code
Nov 18, 2024
Viaarxiv icon

Composable Interventions for Language Models

Add code
Jul 09, 2024
Figure 1 for Composable Interventions for Language Models
Figure 2 for Composable Interventions for Language Models
Figure 3 for Composable Interventions for Language Models
Figure 4 for Composable Interventions for Language Models
Viaarxiv icon

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

Add code
Jun 25, 2024
Viaarxiv icon

Improving Black-box Robustness with In-Context Rewriting

Add code
Feb 15, 2024
Figure 1 for Improving Black-box Robustness with In-Context Rewriting
Figure 2 for Improving Black-box Robustness with In-Context Rewriting
Figure 3 for Improving Black-box Robustness with In-Context Rewriting
Figure 4 for Improving Black-box Robustness with In-Context Rewriting
Viaarxiv icon

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Add code
Apr 03, 2023
Viaarxiv icon