Picture for David Krueger

David Krueger

Probabilistic Modelling is Sufficient for Causal Inference

Add code
Dec 29, 2025
Viaarxiv icon

Language models' activations linearly encode training-order recency

Add code
Sep 17, 2025
Viaarxiv icon

Rethinking Safety in LLM Fine-tuning: An Optimization Perspective

Add code
Aug 17, 2025
Viaarxiv icon

How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn Conversations

Add code
Aug 07, 2025
Viaarxiv icon

Distributional Training Data Attribution

Add code
Jun 15, 2025
Figure 1 for Distributional Training Data Attribution
Figure 2 for Distributional Training Data Attribution
Figure 3 for Distributional Training Data Attribution
Figure 4 for Distributional Training Data Attribution
Viaarxiv icon

Detecting High-Stakes Interactions with Activation Probes

Add code
Jun 12, 2025
Viaarxiv icon

From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization

Add code
May 28, 2025
Viaarxiv icon

Understanding (Un)Reliability of Steering Vectors in Language Models

Add code
May 28, 2025
Viaarxiv icon

Interpreting Emergent Planning in Model-Free Reinforcement Learning

Add code
Apr 02, 2025
Viaarxiv icon

Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models

Add code
Feb 27, 2025
Viaarxiv icon