Picture for Yushi Yang

Yushi Yang

Can sparse autoencoders be used to decompose and interpret steering vectors?

Add code
Nov 13, 2024
Viaarxiv icon

Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction

Add code
Nov 10, 2024
Viaarxiv icon

Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering

Add code
Aug 15, 2024
Viaarxiv icon