Picture for Jiongxiao Wang

Jiongxiao Wang

Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset

Add code
Nov 05, 2024
Figure 1 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Figure 2 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Figure 3 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Figure 4 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Viaarxiv icon

FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks

Add code
Oct 28, 2024
Viaarxiv icon

Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

Add code
Jun 30, 2024
Viaarxiv icon

Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors

Add code
May 17, 2024
Viaarxiv icon

Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment

Add code
Feb 27, 2024
Viaarxiv icon

Preference Poisoning Attacks on Reward Model Learning

Add code
Feb 02, 2024
Viaarxiv icon

Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations

Add code
Nov 16, 2023
Figure 1 for Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Figure 2 for Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Figure 3 for Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Figure 4 for Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Viaarxiv icon

On the Exploitability of Reinforcement Learning with Human Feedback for Large Language Models

Add code
Nov 16, 2023
Viaarxiv icon

On the Exploitability of Instruction Tuning

Add code
Jun 28, 2023
Viaarxiv icon

ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback

Add code
May 29, 2023
Viaarxiv icon