Picture for Zi Liang

Zi Liang

New Paradigm of Adversarial Training: Breaking Inherent Trade-Off between Accuracy and Robustness via Dummy Classes

Add code
Oct 16, 2024
Figure 1 for New Paradigm of Adversarial Training: Breaking Inherent Trade-Off between Accuracy and Robustness via Dummy Classes
Figure 2 for New Paradigm of Adversarial Training: Breaking Inherent Trade-Off between Accuracy and Robustness via Dummy Classes
Figure 3 for New Paradigm of Adversarial Training: Breaking Inherent Trade-Off between Accuracy and Robustness via Dummy Classes
Figure 4 for New Paradigm of Adversarial Training: Breaking Inherent Trade-Off between Accuracy and Robustness via Dummy Classes
Viaarxiv icon

Alignment-Aware Model Extraction Attacks on Large Language Models

Add code
Sep 04, 2024
Figure 1 for Alignment-Aware Model Extraction Attacks on Large Language Models
Figure 2 for Alignment-Aware Model Extraction Attacks on Large Language Models
Figure 3 for Alignment-Aware Model Extraction Attacks on Large Language Models
Figure 4 for Alignment-Aware Model Extraction Attacks on Large Language Models
Viaarxiv icon

Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models

Add code
Aug 05, 2024
Figure 1 for Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models
Figure 2 for Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models
Figure 3 for Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models
Figure 4 for Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models
Viaarxiv icon

MERGE: Fast Private Text Generation

Add code
May 25, 2023
Viaarxiv icon

Healing Unsafe Dialogue Responses with Weak Supervision Signals

Add code
May 25, 2023
Viaarxiv icon

Multi-Action Dialog Policy Learning from Logged User Feedback

Add code
Feb 27, 2023
Viaarxiv icon