Picture for Junxiao Yang

Junxiao Yang

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Add code
Jul 03, 2024
Viaarxiv icon

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

Add code
Nov 15, 2023
Viaarxiv icon