Picture for Zeqing He

Zeqing He

Towards LLM Guardrails via Sparse Representation Steering

Add code
Mar 21, 2025
Viaarxiv icon

Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation

Add code
Mar 09, 2025
Viaarxiv icon