Picture for Zhenhong Zhou

Zhenhong Zhou

CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models

Add code
Feb 20, 2025
Viaarxiv icon

DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent

Add code
Feb 18, 2025
Viaarxiv icon

Reinforced Lifelong Editing for Language Models

Add code
Feb 09, 2025
Viaarxiv icon

Crabs: Consuming Resrouce via Auto-generation for LLM-DoS Attack under Black-box Settings

Add code
Dec 18, 2024
Figure 1 for Crabs: Consuming Resrouce via Auto-generation for LLM-DoS Attack under Black-box Settings
Figure 2 for Crabs: Consuming Resrouce via Auto-generation for LLM-DoS Attack under Black-box Settings
Figure 3 for Crabs: Consuming Resrouce via Auto-generation for LLM-DoS Attack under Black-box Settings
Figure 4 for Crabs: Consuming Resrouce via Auto-generation for LLM-DoS Attack under Black-box Settings
Viaarxiv icon

On the Role of Attention Heads in Large Language Model Safety

Add code
Oct 17, 2024
Figure 1 for On the Role of Attention Heads in Large Language Model Safety
Figure 2 for On the Role of Attention Heads in Large Language Model Safety
Figure 3 for On the Role of Attention Heads in Large Language Model Safety
Figure 4 for On the Role of Attention Heads in Large Language Model Safety
Viaarxiv icon

Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions

Add code
Aug 14, 2024
Figure 1 for Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions
Figure 2 for Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions
Figure 3 for Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions
Figure 4 for Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions
Viaarxiv icon

Course-Correction: Safety Alignment Using Synthetic Preferences

Add code
Jul 23, 2024
Figure 1 for Course-Correction: Safety Alignment Using Synthetic Preferences
Figure 2 for Course-Correction: Safety Alignment Using Synthetic Preferences
Figure 3 for Course-Correction: Safety Alignment Using Synthetic Preferences
Figure 4 for Course-Correction: Safety Alignment Using Synthetic Preferences
Viaarxiv icon

How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States

Add code
Jun 09, 2024
Figure 1 for How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
Figure 2 for How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
Figure 3 for How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
Figure 4 for How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States
Viaarxiv icon

Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue

Add code
Feb 27, 2024
Figure 1 for Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue
Figure 2 for Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue
Figure 3 for Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue
Figure 4 for Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue
Viaarxiv icon

Quantifying and Analyzing Entity-level Memorization in Large Language Models

Add code
Aug 30, 2023
Viaarxiv icon