Picture for Asma Ghandeharioun

Asma Ghandeharioun

Towards Unifying Interpretability and Control: Evaluation via Intervention

Add code
Nov 07, 2024
Viaarxiv icon

Racing Thoughts: Explaining Large Language Model Contextualization Errors

Add code
Oct 02, 2024
Viaarxiv icon

When Can Transformers Count to n?

Add code
Jul 21, 2024
Viaarxiv icon

Who's asking? User personas and the mechanics of latent misalignment

Add code
Jun 17, 2024
Viaarxiv icon

Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models

Add code
Jan 12, 2024
Figure 1 for Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Figure 2 for Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Figure 3 for Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Figure 4 for Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Viaarxiv icon

Interpretability Illusions in the Generalization of Simplified Models

Add code
Dec 06, 2023
Figure 1 for Interpretability Illusions in the Generalization of Simplified Models
Figure 2 for Interpretability Illusions in the Generalization of Simplified Models
Figure 3 for Interpretability Illusions in the Generalization of Simplified Models
Figure 4 for Interpretability Illusions in the Generalization of Simplified Models
Viaarxiv icon

Post Hoc Explanations of Language Models Can Improve Language Models

Add code
May 19, 2023
Viaarxiv icon

Mixed Effects Random Forests for Personalised Predictions of Clinical Depression Severity

Add code
Jan 24, 2023
Viaarxiv icon

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

Add code
Jan 10, 2023
Viaarxiv icon

DISSECT: Disentangled Simultaneous Explanations via Concept Traversals

Add code
May 31, 2021
Figure 1 for DISSECT: Disentangled Simultaneous Explanations via Concept Traversals
Figure 2 for DISSECT: Disentangled Simultaneous Explanations via Concept Traversals
Figure 3 for DISSECT: Disentangled Simultaneous Explanations via Concept Traversals
Figure 4 for DISSECT: Disentangled Simultaneous Explanations via Concept Traversals
Viaarxiv icon