Picture for Julian Michael

Julian Michael

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

Add code
Nov 03, 2025
Viaarxiv icon

Remote Labor Index: Measuring AI Automation of Remote Work

Add code
Oct 30, 2025
Viaarxiv icon

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Add code
Jul 15, 2025
Figure 1 for Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Viaarxiv icon

The Singapore Consensus on Global AI Safety Research Priorities

Add code
Jun 25, 2025
Figure 1 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 2 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 3 for The Singapore Consensus on Global AI Safety Research Priorities
Viaarxiv icon

FORTRESS: Frontier Risk Evaluation for National Security and Public Safety

Add code
Jun 17, 2025
Figure 1 for FORTRESS: Frontier Risk Evaluation for National Security and Public Safety
Figure 2 for FORTRESS: Frontier Risk Evaluation for National Security and Public Safety
Figure 3 for FORTRESS: Frontier Risk Evaluation for National Security and Public Safety
Figure 4 for FORTRESS: Frontier Risk Evaluation for National Security and Public Safety
Viaarxiv icon

International AI Safety Report

Add code
Jan 29, 2025
Figure 1 for International AI Safety Report
Figure 2 for International AI Safety Report
Figure 3 for International AI Safety Report
Figure 4 for International AI Safety Report
Viaarxiv icon

Alignment faking in large language models

Add code
Dec 18, 2024
Viaarxiv icon

Rapid Response: Mitigating LLM Jailbreaks with a Few Examples

Add code
Nov 12, 2024
Figure 1 for Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
Figure 2 for Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
Figure 3 for Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
Figure 4 for Rapid Response: Mitigating LLM Jailbreaks with a Few Examples
Viaarxiv icon

Training Language Models to Win Debates with Self-Play Improves Judge Accuracy

Add code
Sep 25, 2024
Figure 1 for Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
Figure 2 for Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
Figure 3 for Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
Figure 4 for Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
Viaarxiv icon

Analyzing the Role of Semantic Representations in the Era of Large Language Models

Add code
May 02, 2024
Viaarxiv icon