Picture for Henry Papadatos

Henry Papadatos

Evaluating the Goal-Directedness of Large Language Models

Add code
Apr 16, 2025
Viaarxiv icon

Mapping AI Benchmark Data to Quantitative Risk Estimates Through Expert Elicitation

Add code
Mar 06, 2025
Viaarxiv icon

A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management

Add code
Feb 10, 2025
Viaarxiv icon

Linear Probe Penalties Reduce LLM Sycophancy

Add code
Dec 01, 2024
Viaarxiv icon