Picture for Wes Gurnee

Wes Gurnee

Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience

Add code
Aug 26, 2024
Figure 1 for Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
Figure 2 for Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
Viaarxiv icon

The Remarkable Robustness of LLMs: Stages of Inference?

Add code
Jun 27, 2024
Viaarxiv icon

Confidence Regulation Neurons in Language Models

Add code
Jun 24, 2024
Viaarxiv icon

Refusal in Language Models Is Mediated by a Single Direction

Add code
Jun 17, 2024
Viaarxiv icon

Not All Language Model Features Are Linear

Add code
May 23, 2024
Viaarxiv icon

Universal Neurons in GPT2 Language Models

Add code
Jan 22, 2024
Figure 1 for Universal Neurons in GPT2 Language Models
Figure 2 for Universal Neurons in GPT2 Language Models
Figure 3 for Universal Neurons in GPT2 Language Models
Figure 4 for Universal Neurons in GPT2 Language Models
Viaarxiv icon

Training Dynamics of Contextual N-Grams in Language Models

Add code
Nov 01, 2023
Viaarxiv icon

Language Models Represent Space and Time

Add code
Oct 03, 2023
Viaarxiv icon

Finding Neurons in a Haystack: Case Studies with Sparse Probing

Add code
May 02, 2023
Viaarxiv icon

Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization

Add code
Jun 01, 2022
Figure 1 for Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization
Figure 2 for Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization
Figure 3 for Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization
Figure 4 for Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization
Viaarxiv icon