Abstract:Language-based AI systems are diffusing into society, bringing positive and negative impacts. Mitigating negative impacts depends on accurate impact assessments, drawn from an empirical evidence base that makes causal connections between AI usage and impacts. Interconnected post-deployment monitoring combines information about model integration and use, application use, and incidents and impacts. For example, inference time monitoring of chain-of-thought reasoning can be combined with long-term monitoring of sectoral AI diffusion, impacts and incidents. Drawing on information sharing mechanisms in other industries, we highlight example data sources and specific data points that governments could collect to inform AI risk management.
Abstract:Existing strategies for managing risks from advanced AI systems often focus on affecting what AI systems are developed and how they diffuse. However, this approach becomes less feasible as the number of developers of advanced AI grows, and impedes beneficial use-cases as well as harmful ones. In response, we urge a complementary approach: increasing societal adaptation to advanced AI, that is, reducing the expected negative impacts from a given level of diffusion of a given AI capability. We introduce a conceptual framework which helps identify adaptive interventions that avoid, defend against and remedy potentially harmful uses of AI systems, illustrated with examples in election manipulation, cyberterrorism, and loss of control to AI decision-makers. We discuss a three-step cycle that society can implement to adapt to AI. Increasing society's ability to implement this cycle builds its resilience to advanced AI. We conclude with concrete recommendations for governments, industry, and third-parties.