Picture for Dan Hendrycks

Dan Hendrycks

UC Berkeley

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems

Add code
Mar 05, 2025
Viaarxiv icon

Beyond Release: Access Considerations for Generative AI Systems

Add code
Feb 23, 2025
Viaarxiv icon

EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges

Add code
Feb 13, 2025
Viaarxiv icon

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Add code
Feb 12, 2025
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon

Introduction to AI Safety, Ethics, and Society

Add code
Nov 01, 2024
Viaarxiv icon

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Add code
Oct 11, 2024
Figure 1 for AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Figure 2 for AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Figure 3 for AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Figure 4 for AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Viaarxiv icon

LLM-PBE: Assessing Data Privacy in Large Language Models

Add code
Aug 23, 2024
Figure 1 for LLM-PBE: Assessing Data Privacy in Large Language Models
Figure 2 for LLM-PBE: Assessing Data Privacy in Large Language Models
Figure 3 for LLM-PBE: Assessing Data Privacy in Large Language Models
Figure 4 for LLM-PBE: Assessing Data Privacy in Large Language Models
Viaarxiv icon

Tamper-Resistant Safeguards for Open-Weight LLMs

Add code
Aug 01, 2024
Figure 1 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 2 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 3 for Tamper-Resistant Safeguards for Open-Weight LLMs
Figure 4 for Tamper-Resistant Safeguards for Open-Weight LLMs
Viaarxiv icon

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

Add code
Jul 31, 2024
Viaarxiv icon