Picture for Dan Hendrycks

Dan Hendrycks

UC Berkeley

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Add code
Mar 19, 2025
Viaarxiv icon

Superintelligence Strategy: Expert Version

Add code
Mar 07, 2025
Viaarxiv icon

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems

Add code
Mar 05, 2025
Viaarxiv icon

Beyond Release: Access Considerations for Generative AI Systems

Add code
Feb 23, 2025
Viaarxiv icon

EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges

Add code
Feb 13, 2025
Viaarxiv icon

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Add code
Feb 12, 2025
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon

Introduction to AI Safety, Ethics, and Society

Add code
Nov 01, 2024
Viaarxiv icon

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Add code
Oct 11, 2024
Figure 1 for AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Figure 2 for AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Figure 3 for AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Figure 4 for AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Viaarxiv icon

LLM-PBE: Assessing Data Privacy in Large Language Models

Add code
Aug 23, 2024
Figure 1 for LLM-PBE: Assessing Data Privacy in Large Language Models
Figure 2 for LLM-PBE: Assessing Data Privacy in Large Language Models
Figure 3 for LLM-PBE: Assessing Data Privacy in Large Language Models
Figure 4 for LLM-PBE: Assessing Data Privacy in Large Language Models
Viaarxiv icon