Picture for Harry Mayne

Harry Mayne

Can sparse autoencoders be used to decompose and interpret steering vectors?

Add code
Nov 13, 2024
Viaarxiv icon

Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction

Add code
Nov 10, 2024
Viaarxiv icon

LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages

Add code
Jun 11, 2024
Viaarxiv icon

Unsupervised Learning Approaches for Identifying ICU Patient Subgroups: Do Results Generalise?

Add code
Mar 05, 2024
Viaarxiv icon