Picture for Tinghao Xie

Tinghao Xie

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

Add code
Jun 20, 2024
Viaarxiv icon

Fantastic Copyrighted Beasts and How (Not) to Generate Them

Add code
Jun 20, 2024
Viaarxiv icon

AI Risk Management Should Incorporate Both Safety and Security

Add code
May 29, 2024
Viaarxiv icon

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Add code
Feb 07, 2024
Viaarxiv icon

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Add code
Oct 05, 2023
Viaarxiv icon

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

Add code
Aug 23, 2023
Viaarxiv icon

Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations

Add code
May 26, 2022
Figure 1 for Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations
Figure 2 for Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations
Figure 3 for Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations
Figure 4 for Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations
Viaarxiv icon

Circumventing Backdoor Defenses That Are Based on Latent Separability

Add code
May 26, 2022
Figure 1 for Circumventing Backdoor Defenses That Are Based on Latent Separability
Figure 2 for Circumventing Backdoor Defenses That Are Based on Latent Separability
Figure 3 for Circumventing Backdoor Defenses That Are Based on Latent Separability
Figure 4 for Circumventing Backdoor Defenses That Are Based on Latent Separability
Viaarxiv icon

Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks

Add code
Nov 25, 2021
Figure 1 for Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks
Figure 2 for Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks
Figure 3 for Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks
Figure 4 for Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks
Viaarxiv icon