Picture for Dan Hendrycks

Dan Hendrycks

UC Berkeley

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon

Introduction to AI Safety, Ethics, and Society

Add code
Nov 01, 2024
Viaarxiv icon

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Add code
Oct 11, 2024
Viaarxiv icon

LLM-PBE: Assessing Data Privacy in Large Language Models

Add code
Aug 23, 2024
Figure 1 for LLM-PBE: Assessing Data Privacy in Large Language Models
Figure 2 for LLM-PBE: Assessing Data Privacy in Large Language Models
Figure 3 for LLM-PBE: Assessing Data Privacy in Large Language Models
Figure 4 for LLM-PBE: Assessing Data Privacy in Large Language Models
Viaarxiv icon

Tamper-Resistant Safeguards for Open-Weight LLMs

Add code
Aug 01, 2024
Figure 1 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 2 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 3 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 4 for Tamper-Resistant Safeguards for Open-Weight LLMs
Viaarxiv icon

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

Add code
Jul 31, 2024
Viaarxiv icon

Improving Alignment and Robustness with Circuit Breakers

Add code
Jun 10, 2024
Figure 1 for Improving Alignment and Robustness with Circuit Breakers
Figure 2 for Improving Alignment and Robustness with Circuit Breakers
Figure 3 for Improving Alignment and Robustness with Circuit Breakers
Figure 4 for Improving Alignment and Robustness with Circuit Breakers
Viaarxiv icon

Improving Alignment and Robustness with Short Circuiting

Add code
Jun 06, 2024
Figure 1 for Improving Alignment and Robustness with Short Circuiting
Figure 2 for Improving Alignment and Robustness with Short Circuiting
Figure 3 for Improving Alignment and Robustness with Short Circuiting
Figure 4 for Improving Alignment and Robustness with Short Circuiting
Viaarxiv icon

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

Add code
Mar 18, 2024
Figure 1 for Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Figure 2 for Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Figure 3 for Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Figure 4 for Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Viaarxiv icon

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Add code
Mar 06, 2024
Figure 1 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 2 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 3 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 4 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Viaarxiv icon