Picture for Lennart Justen

Lennart Justen

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Add code
Mar 06, 2024
Figure 1 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 2 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 3 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 4 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Viaarxiv icon

Will releasing the weights of future large language models grant widespread access to pandemic agents?

Add code
Nov 01, 2023
Figure 1 for Will releasing the weights of future large language models grant widespread access to pandemic agents?
Viaarxiv icon

No Time Like the Present: Effects of Language Change on Automated Comment Moderation

Add code
Jul 08, 2022
Figure 1 for No Time Like the Present: Effects of Language Change on Automated Comment Moderation
Figure 2 for No Time Like the Present: Effects of Language Change on Automated Comment Moderation
Figure 3 for No Time Like the Present: Effects of Language Change on Automated Comment Moderation
Figure 4 for No Time Like the Present: Effects of Language Change on Automated Comment Moderation
Viaarxiv icon