Picture for Michael-Andrei Panaitescu-Liess

Michael-Andrei Panaitescu-Liess

AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment

Add code
Oct 15, 2024
Figure 1 for AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Figure 2 for AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Figure 3 for AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Figure 4 for AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment
Viaarxiv icon

Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models

Add code
Sep 01, 2024
Viaarxiv icon

Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

Add code
Jul 24, 2024
Viaarxiv icon

More Context, Less Distraction: Visual Classification by Inferring and Conditioning on Contextual Attributes

Add code
Aug 02, 2023
Viaarxiv icon

Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes

Add code
Nov 11, 2021
Figure 1 for Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes
Figure 2 for Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes
Figure 3 for Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes
Figure 4 for Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes
Viaarxiv icon