Picture for Yige Li

Yige Li

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Add code
Oct 28, 2024
Viaarxiv icon

Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models

Add code
Oct 25, 2024
Viaarxiv icon

AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversarial Examples for Vision-Language Models

Add code
Oct 07, 2024
Viaarxiv icon

Adversarial Suffixes May Be Features Too!

Add code
Oct 01, 2024
Viaarxiv icon

Do Influence Functions Work on Large Language Models?

Add code
Sep 30, 2024
Figure 1 for Do Influence Functions Work on Large Language Models?
Figure 2 for Do Influence Functions Work on Large Language Models?
Figure 3 for Do Influence Functions Work on Large Language Models?
Figure 4 for Do Influence Functions Work on Large Language Models?
Viaarxiv icon

BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models

Add code
Aug 23, 2024
Viaarxiv icon

Multi-Trigger Backdoor Attacks: More Triggers, More Threats

Add code
Jan 27, 2024
Viaarxiv icon

End-to-End Anti-Backdoor Learning on Images and Time Series

Add code
Jan 06, 2024
Viaarxiv icon

Reconstructive Neuron Pruning for Backdoor Defense

Add code
May 24, 2023
Viaarxiv icon

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Add code
Oct 25, 2021
Figure 1 for Anti-Backdoor Learning: Training Clean Models on Poisoned Data
Figure 2 for Anti-Backdoor Learning: Training Clean Models on Poisoned Data
Figure 3 for Anti-Backdoor Learning: Training Clean Models on Poisoned Data
Figure 4 for Anti-Backdoor Learning: Training Clean Models on Poisoned Data
Viaarxiv icon