Picture for Zhexin Zhang

Zhexin Zhang

Agent-SafetyBench: Evaluating the Safety of LLM Agents

Add code
Dec 19, 2024
Viaarxiv icon

Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework

Add code
Dec 16, 2024
Viaarxiv icon

Global Challenge for Safe and Secure LLMs Track 1

Add code
Nov 21, 2024
Figure 1 for Global Challenge for Safe and Secure LLMs Track 1
Figure 2 for Global Challenge for Safe and Secure LLMs Track 1
Figure 3 for Global Challenge for Safe and Secure LLMs Track 1
Figure 4 for Global Challenge for Safe and Secure LLMs Track 1
Viaarxiv icon

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Add code
Jul 03, 2024
Viaarxiv icon

Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack

Add code
Jun 17, 2024
Figure 1 for Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Figure 2 for Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Figure 3 for Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Figure 4 for Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack
Viaarxiv icon

ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors

Add code
Feb 26, 2024
Viaarxiv icon

Unveiling the Implicit Toxicity in Large Language Models

Add code
Nov 29, 2023
Viaarxiv icon

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

Add code
Nov 15, 2023
Viaarxiv icon

SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions

Add code
Sep 13, 2023
Viaarxiv icon

Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation

Add code
Jul 10, 2023
Figure 1 for Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Figure 2 for Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Figure 3 for Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Figure 4 for Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Viaarxiv icon