Picture for Lennart Justen

Lennart Justen

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Add code
Mar 06, 2024
Figure 1 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 2 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 3 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 4 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Viaarxiv icon

Will releasing the weights of future large language models grant widespread access to pandemic agents?

Add code
Nov 01, 2023
Viaarxiv icon

No Time Like the Present: Effects of Language Change on Automated Comment Moderation

Add code
Jul 08, 2022
Figure 1 for No Time Like the Present: Effects of Language Change on Automated Comment Moderation
Figure 2 for No Time Like the Present: Effects of Language Change on Automated Comment Moderation
Figure 3 for No Time Like the Present: Effects of Language Change on Automated Comment Moderation
Figure 4 for No Time Like the Present: Effects of Language Change on Automated Comment Moderation
Viaarxiv icon