Picture for Haebin Seong

Haebin Seong

Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems

Add code
Oct 17, 2024
Viaarxiv icon

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models

Add code
Oct 02, 2024
Viaarxiv icon