Picture for Zhexin Zhang

Zhexin Zhang

AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

Add code
Feb 24, 2025
Viaarxiv icon

LongSafety: Evaluating Long-Context Safety of Large Language Models

Add code
Feb 24, 2025
Viaarxiv icon

Agent-SafetyBench: Evaluating the Safety of LLM Agents

Add code
Dec 19, 2024
Viaarxiv icon

Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework

Add code
Dec 16, 2024
Viaarxiv icon

Global Challenge for Safe and Secure LLMs Track 1

Add code
Nov 21, 2024
Figure 1 for Global Challenge for Safe and Secure LLMs Track 1
Figure 2 for Global Challenge for Safe and Secure LLMs Track 1
Figure 3 for Global Challenge for Safe and Secure LLMs Track 1
Figure 4 for Global Challenge for Safe and Secure LLMs Track 1
Viaarxiv icon

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Add code
Jul 03, 2024
Figure 1 for Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Figure 2 for Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Figure 3 for Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Figure 4 for Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Viaarxiv icon

Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack

Add code
Jun 17, 2024
Figure 1 for Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Figure 2 for Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Figure 3 for Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Figure 4 for Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Viaarxiv icon

ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors

Add code
Feb 26, 2024
Figure 1 for ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
Figure 2 for ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
Figure 3 for ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
Figure 4 for ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
Viaarxiv icon

Unveiling the Implicit Toxicity in Large Language Models

Add code
Nov 29, 2023
Viaarxiv icon

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

Add code
Nov 15, 2023
Viaarxiv icon