Picture for Aaron Mueller

Aaron Mueller

Characterizing the Role of Similarity in the Property Inferences of Language Models

Add code
Oct 29, 2024
Viaarxiv icon

Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics

Add code
Oct 28, 2024
Viaarxiv icon

The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability

Add code
Aug 02, 2024
Viaarxiv icon

NNsight and NDIF: Democratizing Access to Foundation Model Internals

Add code
Jul 18, 2024
Viaarxiv icon

Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks

Add code
Jul 05, 2024
Viaarxiv icon

[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Add code
Apr 09, 2024
Viaarxiv icon

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Add code
Mar 31, 2024
Viaarxiv icon

In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

Add code
Nov 13, 2023
Viaarxiv icon

Function Vectors in Large Language Models

Add code
Oct 23, 2023
Viaarxiv icon

Meta-training with Demonstration Retrieval for Efficient Few-shot Learning

Add code
Jun 30, 2023
Viaarxiv icon