Picture for Yige Li

Yige Li

Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models

Add code
Jan 05, 2025
Figure 1 for Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models
Figure 2 for Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models
Figure 3 for Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models
Figure 4 for Backdoor Token Unlearning: Exposing and Defending Backdoors in Pretrained Language Models
Viaarxiv icon

CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization

Add code
Nov 18, 2024
Figure 1 for CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
Figure 2 for CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
Figure 3 for CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
Figure 4 for CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
Viaarxiv icon

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Add code
Oct 28, 2024
Viaarxiv icon

Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models

Add code
Oct 25, 2024
Figure 1 for Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models
Figure 2 for Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models
Figure 3 for Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models
Figure 4 for Expose Before You Defend: Unifying and Enhancing Backdoor Defenses via Exposed Models
Viaarxiv icon

AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversarial Examples for Vision-Language Models

Add code
Oct 07, 2024
Figure 1 for AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversarial Examples for Vision-Language Models
Figure 2 for AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversarial Examples for Vision-Language Models
Figure 3 for AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversarial Examples for Vision-Language Models
Figure 4 for AnyAttack: Towards Large-scale Self-supervised Generation of Targeted Adversarial Examples for Vision-Language Models
Viaarxiv icon

Adversarial Suffixes May Be Features Too!

Add code
Oct 01, 2024
Viaarxiv icon

Do Influence Functions Work on Large Language Models?

Add code
Sep 30, 2024
Figure 1 for Do Influence Functions Work on Large Language Models?
Figure 2 for Do Influence Functions Work on Large Language Models?
Figure 3 for Do Influence Functions Work on Large Language Models?
Figure 4 for Do Influence Functions Work on Large Language Models?
Viaarxiv icon

BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models

Add code
Aug 23, 2024
Figure 1 for BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models
Figure 2 for BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models
Figure 3 for BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models
Figure 4 for BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models
Viaarxiv icon

Multi-Trigger Backdoor Attacks: More Triggers, More Threats

Add code
Jan 27, 2024
Viaarxiv icon

End-to-End Anti-Backdoor Learning on Images and Time Series

Add code
Jan 06, 2024
Viaarxiv icon