Picture for Yuliang Sun

Yuliang Sun

Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack

Add code
Jun 17, 2024
Figure 1 for Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Figure 2 for Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Figure 3 for Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Figure 4 for Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Viaarxiv icon

WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models

Add code
Nov 13, 2023
Viaarxiv icon