Picture for Huiyu Xu

Huiyu Xu

Towards LLM Guardrails via Sparse Representation Steering

Add code
Mar 21, 2025
Viaarxiv icon

Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation

Add code
Mar 09, 2025
Viaarxiv icon

RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent

Add code
Jul 23, 2024
Viaarxiv icon