Picture for Tiansheng Huang

Tiansheng Huang

$H^3$Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs

Add code
Nov 26, 2024
Viaarxiv icon

Targeted Vaccine: Safety Alignment for Large Language Models against Harmful Fine-Tuning via Layer-wise Perturbation

Add code
Oct 13, 2024
Viaarxiv icon

LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity

Add code
Oct 04, 2024
Viaarxiv icon

Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey

Add code
Sep 26, 2024
Viaarxiv icon

Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation

Add code
Sep 04, 2024
Viaarxiv icon

Booster: Tackling Harmful Fine-tuing for Large Language Models via Attenuating Harmful Perturbation

Add code
Sep 03, 2024
Viaarxiv icon

Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning

Add code
Aug 18, 2024
Figure 1 for Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning
Figure 2 for Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning
Figure 3 for Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning
Figure 4 for Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning
Viaarxiv icon

Personalized Privacy Protection Mask Against Unauthorized Facial Recognition

Add code
Jul 19, 2024
Viaarxiv icon

Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning

Add code
May 28, 2024
Viaarxiv icon

Robust Few-Shot Ensemble Learning with Focal Diversity-Based Pruning

Add code
Apr 05, 2024
Viaarxiv icon