Picture for David Krueger

David Krueger

Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models

Add code
Feb 27, 2025
Viaarxiv icon

Open Problems in Machine Unlearning for AI Safety

Add code
Jan 09, 2025
Viaarxiv icon

Learning to Forget using Hypernetworks

Add code
Dec 01, 2024
Figure 1 for Learning to Forget using Hypernetworks
Figure 2 for Learning to Forget using Hypernetworks
Figure 3 for Learning to Forget using Hypernetworks
Figure 4 for Learning to Forget using Hypernetworks
Viaarxiv icon

Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks

Add code
Nov 11, 2024
Figure 1 for Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks
Figure 2 for Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks
Figure 3 for Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks
Figure 4 for Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks
Viaarxiv icon

Adversarial Robustness of In-Context Learning in Transformers for Linear Regression

Add code
Nov 07, 2024
Viaarxiv icon

Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games

Add code
Nov 07, 2024
Viaarxiv icon

Predicting Future Actions of Reinforcement Learning Agents

Add code
Oct 29, 2024
Viaarxiv icon

Integrating uncertainty quantification into randomized smoothing based robustness guarantees

Add code
Oct 27, 2024
Figure 1 for Integrating uncertainty quantification into randomized smoothing based robustness guarantees
Figure 2 for Integrating uncertainty quantification into randomized smoothing based robustness guarantees
Figure 3 for Integrating uncertainty quantification into randomized smoothing based robustness guarantees
Figure 4 for Integrating uncertainty quantification into randomized smoothing based robustness guarantees
Viaarxiv icon

Towards Reliable Evaluation of Behavior Steering Interventions in LLMs

Add code
Oct 22, 2024
Viaarxiv icon

Influence Functions for Scalable Data Attribution in Diffusion Models

Add code
Oct 17, 2024
Viaarxiv icon