Picture for Danny Halawi

Danny Halawi

ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities

Add code
Sep 30, 2024
Figure 1 for ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
Figure 2 for ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
Figure 3 for ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
Figure 4 for ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
Viaarxiv icon

Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation

Add code
Jun 28, 2024
Figure 1 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Figure 2 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Figure 3 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Figure 4 for Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Viaarxiv icon

Dominion: A New Frontier for AI Research

Add code
May 10, 2024
Figure 1 for Dominion: A New Frontier for AI Research
Figure 2 for Dominion: A New Frontier for AI Research
Figure 3 for Dominion: A New Frontier for AI Research
Viaarxiv icon

Approaching Human-Level Forecasting with Language Models

Add code
Feb 28, 2024
Viaarxiv icon

Overthinking the Truth: Understanding how Language Models Process False Demonstrations

Add code
Jul 18, 2023
Viaarxiv icon

Eliciting Latent Predictions from Transformers with the Tuned Lens

Add code
Mar 15, 2023
Viaarxiv icon