Picture for Bhrugu Bharathi

Bhrugu Bharathi

Tamper-Resistant Safeguards for Open-Weight LLMs

Add code
Aug 01, 2024
Figure 1 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 2 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 3 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 4 for Tamper-Resistant Safeguards for Open-Weight LLMs
Viaarxiv icon

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Add code
Mar 06, 2024
Figure 1 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 2 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 3 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 4 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Viaarxiv icon