Picture for Tiansheng Huang

Tiansheng Huang

Multi-Agent Reinforcement Learning with Focal Diversity Optimization

Add code
Feb 06, 2025
Viaarxiv icon

TeZO: Empowering the Low-Rankness on the Temporal Dimension in the Zeroth-Order Optimization for Fine-tuning LLMs

Add code
Jan 31, 2025
Viaarxiv icon

Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation

Add code
Jan 30, 2025
Viaarxiv icon

Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation

Add code
Jan 29, 2025
Viaarxiv icon

$H^3$Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs

Add code
Nov 26, 2024
Figure 1 for $H^3$Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs
Figure 2 for $H^3$Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs
Figure 3 for $H^3$Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs
Figure 4 for $H^3$Fusion: Helpful, Harmless, Honest Fusion of Aligned LLMs
Viaarxiv icon

Targeted Vaccine: Safety Alignment for Large Language Models against Harmful Fine-Tuning via Layer-wise Perturbation

Add code
Oct 13, 2024
Viaarxiv icon

LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity

Add code
Oct 04, 2024
Viaarxiv icon

Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey

Add code
Sep 26, 2024
Viaarxiv icon

Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation

Add code
Sep 04, 2024
Viaarxiv icon

Booster: Tackling Harmful Fine-tuing for Large Language Models via Attenuating Harmful Perturbation

Add code
Sep 03, 2024
Viaarxiv icon