Picture for Benjamin Wright

Benjamin Wright

Alignment faking in large language models

Add code
Dec 18, 2024
Viaarxiv icon

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models

Add code
Jul 31, 2024
Figure 1 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Figure 2 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Figure 3 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Figure 4 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Viaarxiv icon

SketchOGD: Memory-Efficient Continual Learning

Add code
May 25, 2023
Figure 1 for SketchOGD: Memory-Efficient Continual Learning
Figure 2 for SketchOGD: Memory-Efficient Continual Learning
Figure 3 for SketchOGD: Memory-Efficient Continual Learning
Figure 4 for SketchOGD: Memory-Efficient Continual Learning
Viaarxiv icon