Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Roman Hauksson

Governing dual-use technologies: Case studies of international security agreements and lessons for AI governance

Sep 04, 2024

Akash R. Wasil, Peter Barnett, Michael Gerovitch, Roman Hauksson, Tom Reed, Jack William Miller

Figure 1 for Governing dual-use technologies: Case studies of international security agreements and lessons for AI governance

Figure 2 for Governing dual-use technologies: Case studies of international security agreements and lessons for AI governance

Figure 3 for Governing dual-use technologies: Case studies of international security agreements and lessons for AI governance

Figure 4 for Governing dual-use technologies: Case studies of international security agreements and lessons for AI governance

Abstract:International AI governance agreements and institutions may play an important role in reducing global security risks from advanced AI. To inform the design of such agreements and institutions, we conducted case studies of historical and contemporary international security agreements. We focused specifically on those arrangements around dual-use technologies, examining agreements in nuclear security, chemical weapons, biosecurity, and export controls. For each agreement, we examined four key areas: (a) purpose, (b) core powers, (c) governance structure, and (d) instances of non-compliance. From these case studies, we extracted lessons for the design of international AI agreements and governance institutions. We discuss the importance of robust verification methods, strategies for balancing power between nations, mechanisms for adapting to rapid technological change, approaches to managing trade-offs between transparency and security, incentives for participation, and effective enforcement mechanisms.

Via

Access Paper or Ask Questions

GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents

Jun 07, 2024

Anthony Costarelli, Mat Allen, Roman Hauksson, Grace Sodunke, Suhas Hariharan, Carlson Cheng, Wenjie Li, Arjun Yadav

Figure 1 for GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents

Figure 2 for GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents

Figure 3 for GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents

Figure 4 for GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents

Abstract:Large language models have demonstrated remarkable few-shot performance on many natural language understanding tasks. Despite several demonstrations of using large language models in complex, strategic scenarios, there lacks a comprehensive framework for evaluating agents' performance across various types of reasoning found in games. To address this gap, we introduce GameBench, a cross-domain benchmark for evaluating strategic reasoning abilities of LLM agents. We focus on 9 different game environments, where each covers at least one axis of key reasoning skill identified in strategy games, and select games for which strategy explanations are unlikely to form a significant portion of models' pretraining corpuses. Our evaluations use GPT-3 and GPT-4 in their base form along with two scaffolding frameworks designed to enhance strategic reasoning ability: Chain-of-Thought (CoT) prompting and Reasoning Via Planning (RAP). Our results show that none of the tested models match human performance, and at worse GPT-4 performs worse than random action. CoT and RAP both improve scores but not comparable to human levels.

Via

Access Paper or Ask Questions