Picture for Haebin Seong

Haebin Seong

SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models

Add code
Feb 18, 2025
Viaarxiv icon

Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems

Add code
Oct 17, 2024
Viaarxiv icon

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

Add code
Oct 02, 2024
Viaarxiv icon