Picture for Roger Grosse

Roger Grosse

Sabotage Evaluations for Frontier Models

Add code
Oct 28, 2024
Figure 1 for Sabotage Evaluations for Frontier Models
Figure 2 for Sabotage Evaluations for Frontier Models
Figure 3 for Sabotage Evaluations for Frontier Models
Figure 4 for Sabotage Evaluations for Frontier Models
Viaarxiv icon

Measuring Stochastic Data Complexity with Boltzmann Influence Functions

Add code
Jun 04, 2024
Viaarxiv icon

What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions

Add code
May 22, 2024
Viaarxiv icon

Training Data Attribution via Approximate Unrolled Differentiation

Add code
May 21, 2024
Viaarxiv icon

Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo

Add code
Apr 26, 2024
Viaarxiv icon

REFACTOR: Learning to Extract Theorems from Proofs

Add code
Feb 26, 2024
Figure 1 for REFACTOR: Learning to Extract Theorems from Proofs
Figure 2 for REFACTOR: Learning to Extract Theorems from Proofs
Figure 3 for REFACTOR: Learning to Extract Theorems from Proofs
Figure 4 for REFACTOR: Learning to Extract Theorems from Proofs
Viaarxiv icon

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Add code
Jan 17, 2024
Viaarxiv icon

Studying Large Language Model Generalization with Influence Functions

Add code
Aug 07, 2023
Figure 1 for Studying Large Language Model Generalization with Influence Functions
Figure 2 for Studying Large Language Model Generalization with Influence Functions
Figure 3 for Studying Large Language Model Generalization with Influence Functions
Figure 4 for Studying Large Language Model Generalization with Influence Functions
Viaarxiv icon

Improving Mutual Information Estimation with Annealed and Energy-Based Bounds

Add code
Mar 13, 2023
Viaarxiv icon

Efficient Parametric Approximations of Neural Network Function Space Distance

Add code
Feb 07, 2023
Viaarxiv icon