Picture for Caden Juang

Caden Juang

Automatically Interpreting Millions of Features in Large Language Models

Add code
Oct 17, 2024
Viaarxiv icon

NNsight and NDIF: Democratizing Access to Foundation Model Internals

Add code
Jul 18, 2024
Figure 1 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Figure 2 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Figure 3 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Figure 4 for NNsight and NDIF: Democratizing Access to Foundation Model Internals
Viaarxiv icon

Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals

Add code
May 11, 2024
Viaarxiv icon