Picture for Peixuan Han

Peixuan Han

Internal Activation as the Polar Star for Steering Unsafe LLM Behavior

Add code
Feb 04, 2025
Viaarxiv icon

EscapeBench: Pushing Language Models to Think Outside the Box

Add code
Dec 18, 2024
Viaarxiv icon

Distributionally Robust Unsupervised Dense Retrieval Training on Web Graphs

Add code
Oct 26, 2023
Viaarxiv icon