Picture for Tony T. Wang

Tony T. Wang

Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation

Add code
Jun 28, 2024
Figure 1 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Figure 2 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Figure 3 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Figure 4 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Viaarxiv icon

Can Go AIs be adversarially robust?

Add code
Jun 18, 2024
Viaarxiv icon

Forbidden Facts: An Investigation of Competing Objectives in Llama-2

Add code
Dec 31, 2023
Viaarxiv icon

Cliff-Learning

Add code
Feb 14, 2023
Viaarxiv icon