Picture for Hannah Cyberey

Hannah Cyberey

Aligning Language Model Benchmarks with Pairwise Preferences

Add code
Feb 02, 2026
Viaarxiv icon

White-Box Sensitivity Auditing with Steering Vectors

Add code
Jan 23, 2026
Viaarxiv icon

Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control

Add code
Apr 23, 2025
Viaarxiv icon

Sensing and Steering Stereotypes: Extracting and Applying Gender Representation Vectors in LLMs

Add code
Feb 27, 2025
Viaarxiv icon