Picture for Adam Davies

Adam Davies

Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models

Add code
Nov 09, 2024
Viaarxiv icon

Focus On This, Not That! Steering LLMs With Adaptive Feature Specification

Add code
Oct 30, 2024
Viaarxiv icon

Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments

Add code
Oct 03, 2024
Viaarxiv icon

Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions

Add code
Aug 28, 2024
Viaarxiv icon

The Cognitive Revolution in Interpretability: From Explaining Behavior to Interpreting Representations and Algorithms

Add code
Aug 11, 2024
Viaarxiv icon

Competence-Based Analysis of Language Models

Add code
Mar 01, 2023
Viaarxiv icon

Not Just Pretty Pictures: Text-to-Image Generators Enable Interpretable Interventions for Robust Representations

Add code
Dec 21, 2022
Viaarxiv icon