Picture for Junxiao Yang

Junxiao Yang

AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

Add code
Feb 24, 2025
Viaarxiv icon

Agent-SafetyBench: Evaluating the Safety of LLM Agents

Add code
Dec 19, 2024
Viaarxiv icon

Global Challenge for Safe and Secure LLMs Track 1

Add code
Nov 21, 2024
Figure 1 for Global Challenge for Safe and Secure LLMs Track 1
Figure 2 for Global Challenge for Safe and Secure LLMs Track 1
Figure 3 for Global Challenge for Safe and Secure LLMs Track 1
Figure 4 for Global Challenge for Safe and Secure LLMs Track 1
Viaarxiv icon

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Add code
Jul 03, 2024
Figure 1 for Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Figure 2 for Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Figure 3 for Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Figure 4 for Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Viaarxiv icon

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

Add code
Nov 15, 2023
Viaarxiv icon