Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alexander Lavin

Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery

May 22, 2025

Yanbo Zhang, Sumeer A. Khan, Adnan Mahmud, Huck Yang, Alexander Lavin, Michael Levin, Jeremy Frey, Jared Dunnmon, James Evans, Alan Bundy(+3 more)

Abstract:With recent Nobel Prizes recognising AI contributions to science, Large Language Models (LLMs) are transforming scientific research by enhancing productivity and reshaping the scientific method. LLMs are now involved in experimental design, data analysis, and workflows, particularly in chemistry and biology. However, challenges such as hallucinations and reliability persist. In this contribution, we review how Large Language Models (LLMs) are redefining the scientific method and explore their potential applications across different stages of the scientific cycle, from hypothesis testing to discovery. We conclude that, for LLMs to serve as relevant and effective creative engines and productivity enhancers, their deep integration into all steps of the scientific process should be pursued in collaboration and alignment with human scientific goals, with clear evaluation metrics. The transition to AI-driven science raises ethical questions about creativity, oversight, and responsibility. With careful guidance, LLMs could evolve into creative engines, driving transformative breakthroughs across scientific disciplines responsibly and effectively. However, the scientific community must also decide how much it leaves to LLMs to drive science, even when associations with 'reasoning', mostly currently undeserved, are made in exchange for the potential to explore hypothesis and solution regions that might otherwise remain unexplored by human exploration alone.

* npj Artificial Intelligence, 2025
* 45 pages

Via

Access Paper or Ask Questions

The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence

Jul 09, 2023

Hector Zenil, Jesper Tegnér, Felipe S. Abrahão, Alexander Lavin, Vipin Kumar, Jeremy G. Frey, Adrian Weller, Larisa Soldatova, Alan R. Bundy, Nicholas R. Jennings(+12 more)

Figure 1 for The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence

Figure 2 for The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence

Figure 3 for The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence

Figure 4 for The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence

Abstract:Recent advances in machine learning and AI, including Generative AI and LLMs, are disrupting technological innovation, product development, and society as a whole. AI's contribution to technology can come from multiple approaches that require access to large training data sets and clear performance evaluation criteria, ranging from pattern recognition and classification to generative models. Yet, AI has contributed less to fundamental science in part because large data sets of high-quality data for scientific practice and model discovery are more difficult to access. Generative AI, in general, and Large Language Models in particular, may represent an opportunity to augment and accelerate the scientific discovery of fundamental deep science with quantitative models. Here we explore and investigate aspects of an AI-driven, automated, closed-loop approach to scientific discovery, including self-driven hypothesis generation and open-ended autonomous exploration of the hypothesis space. Integrating AI-driven automation into the practice of science would mitigate current problems, including the replication of findings, systematic production of data, and ultimately democratisation of the scientific process. Realising these possibilities requires a vision for augmented AI coupled with a diversity of AI approaches able to deal with fundamental aspects of causality analysis and model discovery while enabling unbiased search across the space of putative explanations. These advances hold the promise to unleash AI's potential for searching and discovering the fundamental structure of our world beyond what human scientists have been able to achieve. Such a vision would push the boundaries of new fundamental science rather than automatize current workflows and instead open doors for technological innovation to tackle some of the greatest challenges facing humanity today.

* 35 pages, first draft of the final report from the Alan Turing Institute on AI for Scientific Discovery

Via

Access Paper or Ask Questions

Multi-scale Digital Twin: Developing a fast and physics-informed surrogate model for groundwater contamination with uncertain climate models

Nov 20, 2022

Lijing Wang, Takuya Kurihana, Aurelien Meray, Ilijana Mastilovic, Satyarth Praveen, Zexuan Xu, Milad Memarzadeh, Alexander Lavin, Haruko Wainwright

Figure 1 for Multi-scale Digital Twin: Developing a fast and physics-informed surrogate model for groundwater contamination with uncertain climate models

Figure 2 for Multi-scale Digital Twin: Developing a fast and physics-informed surrogate model for groundwater contamination with uncertain climate models

Figure 3 for Multi-scale Digital Twin: Developing a fast and physics-informed surrogate model for groundwater contamination with uncertain climate models

Abstract:Soil and groundwater contamination is a pervasive problem at thousands of locations across the world. Contaminated sites often require decades to remediate or to monitor natural attenuation. Climate change exacerbates the long-term site management problem because extreme precipitation and/or shifts in precipitation/evapotranspiration regimes could re-mobilize contaminants and proliferate affected groundwater. To quickly assess the spatiotemporal variations of groundwater contamination under uncertain climate disturbances, we developed a physics-informed machine learning surrogate model using U-Net enhanced Fourier Neural Operator (U-FNO) to solve Partial Differential Equations (PDEs) of groundwater flow and transport simulations at the site scale.We develop a combined loss function that includes both data-driven factors and physical boundary constraints at multiple spatiotemporal scales. Our U-FNOs can reliably predict the spatiotemporal variations of groundwater flow and contaminant transport properties from 1954 to 2100 with realistic climate projections. In parallel, we develop a convolutional autoencoder combined with online clustering to reduce the dimensionality of the vast historical and projected climate data by quantifying climatic region similarities across the United States. The ML-based unique climate clusters provide climate projections for the surrogate modeling and help return reliable future recharge rate projections immediately without querying large climate datasets. In all, this Multi-scale Digital Twin work can advance the field of environmental remediation under climate change.

* 5 pages, 2 figures, 1 table, Machine Learning and the Physical Sciences workshop, NeurIPS 2022

Via

Access Paper or Ask Questions

Physical Computing for Materials Acceleration Platforms

Aug 17, 2022

Erik Peterson, Alexander Lavin

Figure 1 for Physical Computing for Materials Acceleration Platforms

Figure 2 for Physical Computing for Materials Acceleration Platforms

Abstract:A ''technology lottery'' describes a research idea or technology succeeding over others because it is suited to the available software and hardware, not necessarily because it is superior to alternative directions--examples abound, from the synergies of deep learning and GPUs to the disconnect of urban design and autonomous vehicles. The nascent field of Self-Driving Laboratories (SDL), particularly those implemented as Materials Acceleration Platforms (MAPs), is at risk of an analogous pitfall: the next logical step for building MAPs is to take existing lab equipment and workflows and mix in some AI and automation. In this whitepaper, we argue that the same simulation and AI tools that will accelerate the search for new materials, as part of the MAPs research program, also make possible the design of fundamentally new computing mediums. We need not be constrained by existing biases in science, mechatronics, and general-purpose computing, but rather we can pursue new vectors of engineering physics with advances in cyber-physical learning and closed-loop, self-optimizing systems. Here we outline a simulation-based MAP program to design computers that use physics itself to solve optimization problems. Such systems mitigate the hardware-software-substrate-user information losses present in every other class of MAPs and they perfect alignment between computing problems and computing mediums eliminating any technology lottery. We offer concrete steps toward early ''Physical Computing (PC) -MAP'' advances and the longer term cyber-physical R&D which we expect to introduce a new era of innovative collaboration between materials researchers and computer scientists.

Via

Access Paper or Ask Questions

The Unreasonable Effectiveness of Deep Evidential Regression

May 20, 2022

Nis Meinert, Jakob Gawlikowski, Alexander Lavin

Figure 1 for The Unreasonable Effectiveness of Deep Evidential Regression

Figure 2 for The Unreasonable Effectiveness of Deep Evidential Regression

Figure 3 for The Unreasonable Effectiveness of Deep Evidential Regression

Figure 4 for The Unreasonable Effectiveness of Deep Evidential Regression

Abstract:There is a significant need for principled uncertainty reasoning in machine learning systems as they are increasingly deployed in safety-critical domains. A new approach with uncertainty-aware regression-based neural networks (NNs), based on learning evidential distributions for aleatoric and epistemic uncertainties, shows promise over traditional deterministic methods and typical Bayesian NNs, notably with the capabilities to disentangle aleatoric and epistemic uncertainties. Despite some empirical success of Deep Evidential Regression (DER), there are important gaps in the mathematical foundation that raise the question of why the proposed technique seemingly works. We detail the theoretical shortcomings and analyze the performance on synthetic and real-world data sets, showing that Deep Evidential Regression is a heuristic rather than an exact uncertainty quantification. We go on to propose corrections and redefinitions of how aleatoric and epistemic uncertainties should be extracted from NNs.

* 14 pages, 25 figures

Via

Access Paper or Ask Questions

Simulation Intelligence: Towards a New Generation of Scientific Methods

Dec 06, 2021

Alexander Lavin, Hector Zenil, Brooks Paige, David Krakauer, Justin Gottschlich, Tim Mattson, Anima Anandkumar, Sanjay Choudry, Kamil Rocki, Atılım Güneş Baydin(+13 more)

Figure 1 for Simulation Intelligence: Towards a New Generation of Scientific Methods

Figure 2 for Simulation Intelligence: Towards a New Generation of Scientific Methods

Figure 3 for Simulation Intelligence: Towards a New Generation of Scientific Methods

Figure 4 for Simulation Intelligence: Towards a New Generation of Scientific Methods

Abstract:The original "Seven Motifs" set forth a roadmap of essential methods for the field of scientific computing, where a motif is an algorithmic method that captures a pattern of computation and data movement. We present the "Nine Motifs of Simulation Intelligence", a roadmap for the development and integration of the essential algorithms necessary for a merger of scientific computing, scientific simulation, and artificial intelligence. We call this merger simulation intelligence (SI), for short. We argue the motifs of simulation intelligence are interconnected and interdependent, much like the components within the layers of an operating system. Using this metaphor, we explore the nature of each layer of the simulation intelligence operating system stack (SI-stack) and the motifs therein: (1) Multi-physics and multi-scale modeling; (2) Surrogate modeling and emulation; (3) Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based modeling; (6) Probabilistic programming; (7) Differentiable programming; (8) Open-ended optimization; (9) Machine programming. We believe coordinated efforts between motifs offers immense opportunity to accelerate scientific discovery, from solving inverse problems in synthetic biology and climate science, to directing nuclear energy experiments and predicting emergent behavior in socioeconomic settings. We elaborate on each layer of the SI-stack, detailing the state-of-art methods, presenting examples to highlight challenges and opportunities, and advocating for specific ways to advance the motifs and the synergies from their combinations. Advancing and integrating these technologies can enable a robust and efficient hypothesis-simulation-analysis type of scientific method, which we introduce with several use-cases for human-machine teaming and automated science.

Via

Access Paper or Ask Questions

Physically-Consistent Generative Adversarial Networks for Coastal Flood Visualization

May 05, 2021

Björn Lütjens, Brandon Leshchinskiy, Christian Requena-Mesa, Farrukh Chishtie, Natalia Díaz-Rodríguez, Océane Boulais, Aruna Sankaranarayanan, Aaron Piña, Yarin Gal, Chedy Raïssi(+2 more)

Figure 1 for Physically-Consistent Generative Adversarial Networks for Coastal Flood Visualization

Figure 2 for Physically-Consistent Generative Adversarial Networks for Coastal Flood Visualization

Figure 3 for Physically-Consistent Generative Adversarial Networks for Coastal Flood Visualization

Figure 4 for Physically-Consistent Generative Adversarial Networks for Coastal Flood Visualization

Abstract:As climate change increases the intensity of natural disasters, society needs better tools for adaptation. Floods, for example, are the most frequent natural disaster, and better tools for flood risk communication could increase the support for flood-resilient infrastructure development. Our work aims to enable more visual communication of large-scale climate impacts via visualizing the output of coastal flood models as satellite imagery. We propose the first deep learning pipeline to ensure physical-consistency in synthetic visual satellite imagery. We advanced a state-of-the-art GAN called pix2pixHD, such that it produces imagery that is physically-consistent with the output of an expert-validated storm surge model (NOAA SLOSH). By evaluating the imagery relative to physics-based flood maps, we find that our proposed framework outperforms baseline models in both physical-consistency and photorealism. We envision our work to be the first step towards a global visualization of how climate change shapes our landscape. Continuing on this path, we show that the proposed pipeline generalizes to visualize arctic sea ice melt. We also publish a dataset of over 25k labelled image-pairs to study image-to-image translation in Earth observation.

* arXiv admin note: text overlap with arXiv:2010.08103

Via

Access Paper or Ask Questions

Multivariate Deep Evidential Regression

Apr 15, 2021

Nis Meinert, Alexander Lavin

Figure 1 for Multivariate Deep Evidential Regression

Figure 2 for Multivariate Deep Evidential Regression

Figure 3 for Multivariate Deep Evidential Regression

Figure 4 for Multivariate Deep Evidential Regression

Abstract:There is significant need for principled uncertainty reasoning in machine learning systems as they are increasingly deployed in safety-critical domains. A new approach with uncertainty-aware neural networks shows promise over traditional deterministic methods, yet several important gaps in the theory and implementation of these networks remain. We discuss three issues with a proposed solution to extract aleatoric and epistemic uncertainties from regression-based neural networks. The aforementioned proposal derives a technique by placing evidential priors over the original Gaussian likelihood function and training the neural network to infer the hyperparemters of the evidential distribution. Doing so allows for the simultaneous extraction of both uncertainties without sampling or utilization of out-of-distribution data for univariate regression tasks. We describe the outstanding issues in detail, provide a possible solution, and generalize the technique for the multivariate case.

* 12 pages, 7 figures

Via

Access Paper or Ask Questions

Technology Readiness Levels for Machine Learning Systems

Jan 11, 2021

Alexander Lavin, Ciarán M. Gilligan-Lee, Alessya Visnjic, Siddha Ganju, Dava Newman, Sujoy Ganguly, Danny Lange, Atılım Güneş Baydin, Amit Sharma, Adam Gibson(+4 more)

Figure 1 for Technology Readiness Levels for Machine Learning Systems

Figure 2 for Technology Readiness Levels for Machine Learning Systems

Abstract:The development and deployment of machine learning (ML) systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. The lack of diligence can lead to technical debt, scope creep and misaligned objectives, model misuse and failures, and expensive consequences. Engineering systems, on the other hand, follow well-defined processes and testing standards to streamline development for high-quality, reliable results. The extreme is spacecraft systems, where mission critical measures and robustness are ingrained in the development process. Drawing on experience in both spacecraft engineering and ML (from research through product across domain areas), we have developed a proven systems engineering approach for machine learning development and deployment. Our "Machine Learning Technology Readiness Levels" (MLTRL) framework defines a principled process to ensure robust, reliable, and responsible systems while being streamlined for ML workflows, including key distinctions from traditional software engineering. Even more, MLTRL defines a lingua franca for people across teams and organizations to work collaboratively on artificial intelligence and machine learning technologies. Here we describe the framework and elucidate it with several real world use-cases of developing ML methods from basic research through productization and deployment, in areas such as medical diagnostics, consumer computer vision, satellite imagery, and particle physics.

Via

Access Paper or Ask Questions

Learnings from Frontier Development Lab and SpaceML -- AI Accelerators for NASA and ESA

Nov 09, 2020

Siddha Ganju, Anirudh Koul, Alexander Lavin, Josh Veitch-Michaelis, Meher Kasam, James Parr

Figure 1 for Learnings from Frontier Development Lab and SpaceML -- AI Accelerators for NASA and ESA

Figure 2 for Learnings from Frontier Development Lab and SpaceML -- AI Accelerators for NASA and ESA

Figure 3 for Learnings from Frontier Development Lab and SpaceML -- AI Accelerators for NASA and ESA

Abstract:Research with AI and ML technologies lives in a variety of settings with often asynchronous goals and timelines: academic labs and government organizations pursue open-ended research focusing on discoveries with long-term value, while research in industry is driven by commercial pursuits and hence focuses on short-term timelines and return on investment. The journey from research to product is often tacit or ad hoc, resulting in technology transition failures, further exacerbated when research and development is interorganizational and interdisciplinary. Even more, much of the ability to produce results remains locked in the private repositories and know-how of the individual researcher, slowing the impact on future research by others and contributing to the ML community's challenges in reproducibility. With research organizations focused on an exploding array of fields, opportunities for the handover and maturation of interdisciplinary research reduce. With these tensions, we see an emerging need to measure the correctness, impact, and relevance of research during its development to enable better collaboration, improved reproducibility, faster progress, and more trusted outcomes. We perform a case study of the Frontier Development Lab (FDL), an AI accelerator under a public-private partnership from NASA and ESA. FDL research follows principled practices that are grounded in responsible development, conduct, and dissemination of AI research, enabling FDL to churn successful interdisciplinary and interorganizational research projects, measured through NASA's Technology Readiness Levels. We also take a look at the SpaceML Open Source Research Program, which helps accelerate and transition FDL's research to deployable projects with wide spread adoption amongst citizen scientists.

Via

Access Paper or Ask Questions