Picture for Bochuan Cao

Bochuan Cao

Data Free Backdoor Attacks

Add code
Dec 09, 2024
Viaarxiv icon

AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models

Add code
Oct 28, 2024
Viaarxiv icon

On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

Add code
Jun 04, 2024
Viaarxiv icon

XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution

Add code
May 30, 2024
Viaarxiv icon

Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

Add code
May 28, 2024
Figure 1 for Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Figure 2 for Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Figure 3 for Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Figure 4 for Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Viaarxiv icon

WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response

Add code
May 22, 2024
Viaarxiv icon

On the Difficulty of Defending Contrastive Learning against Backdoor Attacks

Add code
Dec 14, 2023
Figure 1 for On the Difficulty of Defending Contrastive Learning against Backdoor Attacks
Figure 2 for On the Difficulty of Defending Contrastive Learning against Backdoor Attacks
Figure 3 for On the Difficulty of Defending Contrastive Learning against Backdoor Attacks
Figure 4 for On the Difficulty of Defending Contrastive Learning against Backdoor Attacks
Viaarxiv icon

Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections

Add code
Nov 15, 2023
Viaarxiv icon

IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI

Add code
Oct 30, 2023
Figure 1 for IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI
Figure 2 for IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI
Figure 3 for IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI
Figure 4 for IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI
Viaarxiv icon

On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused?

Add code
Oct 02, 2023
Figure 1 for On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused?
Figure 2 for On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused?
Figure 3 for On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused?
Figure 4 for On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused?
Viaarxiv icon