Picture for Sizhe Chen

Sizhe Chen

Aligning LLMs to Be Robust Against Prompt Injection

Add code
Oct 07, 2024
Viaarxiv icon

Jatmo: Prompt Injection Defense by Task-Specific Finetuning

Add code
Jan 08, 2024
Viaarxiv icon

Can LLMs Follow Simple Rules?

Add code
Nov 06, 2023
Figure 1 for Can LLMs Follow Simple Rules?
Figure 2 for Can LLMs Follow Simple Rules?
Figure 3 for Can LLMs Follow Simple Rules?
Figure 4 for Can LLMs Follow Simple Rules?
Viaarxiv icon

Investigating Catastrophic Overfitting in Fast Adversarial Training: A Self-fitting Perspective

Add code
Feb 23, 2023
Viaarxiv icon

Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors

Add code
Nov 22, 2022
Viaarxiv icon

Unifying Gradients to Improve Real-world Robustness for Deep Networks

Add code
Aug 12, 2022
Figure 1 for Unifying Gradients to Improve Real-world Robustness for Deep Networks
Figure 2 for Unifying Gradients to Improve Real-world Robustness for Deep Networks
Figure 3 for Unifying Gradients to Improve Real-world Robustness for Deep Networks
Figure 4 for Unifying Gradients to Improve Real-world Robustness for Deep Networks
Viaarxiv icon

One-Pixel Shortcut: on the Learning Preference of Deep Neural Networks

Add code
May 24, 2022
Figure 1 for One-Pixel Shortcut: on the Learning Preference of Deep Neural Networks
Figure 2 for One-Pixel Shortcut: on the Learning Preference of Deep Neural Networks
Figure 3 for One-Pixel Shortcut: on the Learning Preference of Deep Neural Networks
Figure 4 for One-Pixel Shortcut: on the Learning Preference of Deep Neural Networks
Viaarxiv icon

Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks

Add code
May 24, 2022
Figure 1 for Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks
Figure 2 for Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks
Figure 3 for Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks
Figure 4 for Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks
Viaarxiv icon

Subspace Adversarial Training

Add code
Nov 24, 2021
Figure 1 for Subspace Adversarial Training
Figure 2 for Subspace Adversarial Training
Figure 3 for Subspace Adversarial Training
Figure 4 for Subspace Adversarial Training
Viaarxiv icon

Dominant Patterns: Critical Features Hidden in Deep Neural Networks

Add code
May 31, 2021
Figure 1 for Dominant Patterns: Critical Features Hidden in Deep Neural Networks
Figure 2 for Dominant Patterns: Critical Features Hidden in Deep Neural Networks
Figure 3 for Dominant Patterns: Critical Features Hidden in Deep Neural Networks
Figure 4 for Dominant Patterns: Critical Features Hidden in Deep Neural Networks
Viaarxiv icon