Picture for David Krueger

David Krueger

Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks

Add code
Nov 11, 2024
Viaarxiv icon

Adversarial Robustness of In-Context Learning in Transformers for Linear Regression

Add code
Nov 07, 2024
Viaarxiv icon

Noisy Zero-Shot Coordination: Breaking The Common Knowledge Assumption In Zero-Shot Coordination Games

Add code
Nov 07, 2024
Viaarxiv icon

Predicting Future Actions of Reinforcement Learning Agents

Add code
Oct 29, 2024
Viaarxiv icon

Integrating uncertainty quantification into randomized smoothing based robustness guarantees

Add code
Oct 27, 2024
Viaarxiv icon

Towards Reliable Evaluation of Behavior Steering Interventions in LLMs

Add code
Oct 22, 2024
Viaarxiv icon

Influence Functions for Scalable Data Attribution in Diffusion Models

Add code
Oct 17, 2024
Viaarxiv icon

Analyzing (In)Abilities of SAEs via Formal Languages

Add code
Oct 15, 2024
Viaarxiv icon

PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning

Add code
Oct 11, 2024
Figure 1 for PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Figure 2 for PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Figure 3 for PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Figure 4 for PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Viaarxiv icon

Exploring the design space of deep-learning-based weather forecasting systems

Add code
Oct 09, 2024
Viaarxiv icon