Picture for Robert Kirk

Robert Kirk

Dataset Featurization: Uncovering Natural Language Features through Unsupervised Data Reconstruction

Add code
Feb 24, 2025
Viaarxiv icon

How Do Large Language Monkeys Get Their Power (Laws)?

Add code
Feb 24, 2025
Viaarxiv icon

Fundamental Limitations in Defending LLM Finetuning APIs

Add code
Feb 20, 2025
Viaarxiv icon

Investigating Non-Transitivity in LLM-as-a-Judge

Add code
Feb 19, 2025
Viaarxiv icon

Open Problems in Machine Unlearning for AI Safety

Add code
Jan 09, 2025
Viaarxiv icon

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

Add code
Nov 19, 2024
Figure 1 for Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Figure 2 for Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Figure 3 for Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Figure 4 for Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Viaarxiv icon

Analyzing the Generalization and Reliability of Steering Vectors -- ICML 2024

Add code
Jul 17, 2024
Viaarxiv icon

Leading the Pack: N-player Opponent Shaping

Add code
Dec 26, 2023
Figure 1 for Leading the Pack: N-player Opponent Shaping
Figure 2 for Leading the Pack: N-player Opponent Shaping
Figure 3 for Leading the Pack: N-player Opponent Shaping
Figure 4 for Leading the Pack: N-player Opponent Shaping
Viaarxiv icon

Generalization to New Sequential Decision Making Tasks with In-Context Learning

Add code
Dec 06, 2023
Figure 1 for Generalization to New Sequential Decision Making Tasks with In-Context Learning
Figure 2 for Generalization to New Sequential Decision Making Tasks with In-Context Learning
Figure 3 for Generalization to New Sequential Decision Making Tasks with In-Context Learning
Figure 4 for Generalization to New Sequential Decision Making Tasks with In-Context Learning
Viaarxiv icon

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

Add code
Nov 21, 2023
Viaarxiv icon