Picture for Ekdeep Singh Lubana

Ekdeep Singh Lubana

Abrupt Learning in Transformers: A Case Study on Matrix Completion

Add code
Oct 29, 2024
Viaarxiv icon

Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing

Add code
Oct 22, 2024
Viaarxiv icon

Towards Reliable Evaluation of Behavior Steering Interventions in LLMs

Add code
Oct 22, 2024
Viaarxiv icon

Analyzing (In)Abilities of SAEs via Formal Languages

Add code
Oct 15, 2024
Viaarxiv icon

Dynamics of Concept Learning and Compositional Generalization

Add code
Oct 10, 2024
Figure 1 for Dynamics of Concept Learning and Compositional Generalization
Figure 2 for Dynamics of Concept Learning and Compositional Generalization
Figure 3 for Dynamics of Concept Learning and Compositional Generalization
Figure 4 for Dynamics of Concept Learning and Compositional Generalization
Viaarxiv icon

A Percolation Model of Emergence: Analyzing Transformers Trained on a Formal Language

Add code
Aug 22, 2024
Viaarxiv icon

What Makes and Breaks Safety Fine-tuning? A Mechanistic Study

Add code
Jul 16, 2024
Viaarxiv icon

What Makes and Breaks Safety Fine-tuning? Mechanistic Study

Add code
Jul 14, 2024
Viaarxiv icon

Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space

Add code
Jun 27, 2024
Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Viaarxiv icon