Picture for Gabriel Mukobi

Gabriel Mukobi

AI Consciousness and Public Perceptions: Four Futures

Add code
Aug 08, 2024
Figure 1 for AI Consciousness and Public Perceptions: Four Futures
Figure 2 for AI Consciousness and Public Perceptions: Four Futures
Figure 3 for AI Consciousness and Public Perceptions: Four Futures
Figure 4 for AI Consciousness and Public Perceptions: Four Futures
Viaarxiv icon

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

Add code
Jul 31, 2024
Viaarxiv icon

Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

Add code
Jun 06, 2024
Figure 1 for Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
Figure 2 for Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
Figure 3 for Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
Figure 4 for Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?
Viaarxiv icon

Societal Adaptation to Advanced AI

Add code
May 16, 2024
Figure 1 for Societal Adaptation to Advanced AI
Figure 2 for Societal Adaptation to Advanced AI
Figure 3 for Societal Adaptation to Advanced AI
Viaarxiv icon

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Add code
Mar 06, 2024
Figure 1 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 2 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 3 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Figure 4 for The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Viaarxiv icon

Escalation Risks from Language Models in Military and Diplomatic Decision-Making

Add code
Jan 07, 2024
Viaarxiv icon

SuperHF: Supervised Iterative Learning from Human Feedback

Add code
Oct 25, 2023
Viaarxiv icon

Welfare Diplomacy: Benchmarking Language Model Cooperation

Add code
Oct 13, 2023
Viaarxiv icon