Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael McCabe

Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation

Jul 03, 2025

François Rozet, Ruben Ohana, Michael McCabe, Gilles Louppe, François Lanusse, Shirley Ho

Abstract:The steep computational cost of diffusion models at inference hinders their use as fast physics emulators. In the context of image and video generation, this computational drawback has been addressed by generating in the latent space of an autoencoder instead of the pixel space. In this work, we investigate whether a similar strategy can be effectively applied to the emulation of dynamical systems and at what cost. We find that the accuracy of latent-space emulation is surprisingly robust to a wide range of compression rates (up to 1000x). We also show that diffusion-based emulators are consistently more accurate than non-generative counterparts and compensate for uncertainty in their predictions with greater diversity. Finally, we cover practical design choices, spanning from architectures to optimizers, that we found critical to train latent-space emulators.

Via

Access Paper or Ask Questions

The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Nov 30, 2024

Ruben Ohana, Michael McCabe, Lucas Meyer, Rudy Morel, Fruzsina J. Agocs, Miguel Beneitez, Marsha Berger, Blakesley Burkhart, Stuart B. Dalziel, Drummond B. Fielding(+16 more)

Figure 1 for The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Figure 2 for The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Figure 3 for The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Figure 4 for The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

Abstract:Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containing numerical simulations of a wide variety of spatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such as biological systems, fluid dynamics, acoustic scattering, as well as magneto-hydrodynamic simulations of extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broader benchmark suite. To facilitate usage of the Well, we provide a unified PyTorch interface for training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https://github.com/PolymathicAI/the_well.

* 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks

Via

Access Paper or Ask Questions

Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task

May 30, 2024

Siavash Golkar, Alberto Bietti, Mariel Pettee, Michael Eickenberg, Miles Cranmer, Keiya Hirashima, Geraud Krawezik, Nicholas Lourie, Michael McCabe, Rudy Morel(+5 more)

Abstract:Transformers have revolutionized machine learning across diverse domains, yet understanding their behavior remains crucial, particularly in high-stakes applications. This paper introduces the contextual counting task, a novel toy problem aimed at enhancing our understanding of Transformers in quantitative and scientific contexts. This task requires precise localization and computation within datasets, akin to object detection or region-based scientific analysis. We present theoretical and empirical analysis using both causal and non-causal Transformer architectures, investigating the influence of various positional encodings on performance and interpretability. In particular, we find that causal attention is much better suited for the task, and that no positional embeddings lead to the best accuracy, though rotary embeddings are competitive and easier to train. We also show that out of distribution performance is tightly linked to which tokens it uses as a bias term.

Via

Access Paper or Ask Questions

xVal: A Continuous Number Encoding for Large Language Models

Oct 04, 2023

Siavash Golkar, Mariel Pettee, Michael Eickenberg, Alberto Bietti, Miles Cranmer, Geraud Krawezik, Francois Lanusse, Michael McCabe, Ruben Ohana, Liam Parker(+4 more)

Abstract:Large Language Models have not yet been broadly adapted for the analysis of scientific datasets due in part to the unique difficulties of tokenizing numbers. We propose xVal, a numerical encoding scheme that represents any real number using just a single token. xVal represents a given real number by scaling a dedicated embedding vector by the number value. Combined with a modified number-inference approach, this strategy renders the model end-to-end continuous when considered as a map from the numbers of the input string to those of the output string. This leads to an inductive bias that is generally more suitable for applications in scientific domains. We empirically evaluate our proposal on a number of synthetic and real-world datasets. Compared with existing number encoding schemes, we find that xVal is more token-efficient and demonstrates improved generalization.

* 10 pages 7 figures. Supplementary: 5 pages 2 figures

Via

Access Paper or Ask Questions

AstroCLIP: Cross-Modal Pre-Training for Astronomical Foundation Models

Oct 04, 2023

Francois Lanusse, Liam Parker, Siavash Golkar, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Geraud Krawezik, Michael McCabe, Ruben Ohana, Mariel Pettee(+4 more)

Abstract:We present AstroCLIP, a strategy to facilitate the construction of astronomical foundation models that bridge the gap between diverse observational modalities. We demonstrate that a cross-modal contrastive learning approach between images and optical spectra of galaxies yields highly informative embeddings of both modalities. In particular, we apply our method on multi-band images and optical spectra from the Dark Energy Spectroscopic Instrument (DESI), and show that: (1) these embeddings are well-aligned between modalities and can be used for accurate cross-modal searches, and (2) these embeddings encode valuable physical information about the galaxies -- in particular redshift and stellar mass -- that can be used to achieve competitive zero- and few- shot predictions without further finetuning. Additionally, in the process of developing our approach, we also construct a novel, transformer-based model and pretraining approach for processing galaxy spectra.

* Submitted to the NeurIPS 2023 AI4Science Workshop

Via

Access Paper or Ask Questions

Multiple Physics Pretraining for Physical Surrogate Models

Oct 04, 2023

Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Holden Parker, Ruben Ohana, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Siavash Golkar, Geraud Krawezik, Francois Lanusse(+4 more)

Abstract:We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling. MPP involves training large surrogate models to predict the dynamics of multiple heterogeneous physical systems simultaneously by learning features that are broadly useful across diverse physical tasks. In order to learn effectively in this setting, we introduce a shared embedding and normalization strategy that projects the fields of multiple systems into a single shared embedding space. We validate the efficacy of our approach on both pretraining and downstream tasks over a broad fluid mechanics-oriented benchmark. We show that a single MPP-pretrained transformer is able to match or outperform task-specific baselines on all pretraining sub-tasks without the need for finetuning. For downstream tasks, we demonstrate that finetuning MPP-trained models results in more accurate predictions across multiple time-steps on new physics compared to training from scratch or finetuning pretrained video foundation models. We open-source our code and model weights trained at multiple scales for reproducibility and community experimentation.

Via

Access Paper or Ask Questions

Towards Stability of Autoregressive Neural Operators

Jun 18, 2023

Michael McCabe, Peter Harrington, Shashank Subramanian, Jed Brown

Abstract:Neural operators have proven to be a promising approach for modeling spatiotemporal systems in the physical sciences. However, training these models for large systems can be quite challenging as they incur significant computational and memory expense -- these systems are often forced to rely on autoregressive time-stepping of the neural network to predict future temporal states. While this is effective in managing costs, it can lead to uncontrolled error growth over time and eventual instability. We analyze the sources of this autoregressive error growth using prototypical neural operator models for physical systems and explore ways to mitigate it. We introduce architectural and application-specific improvements that allow for careful control of instability-inducing operations within these models without inflating the compute/memory expense. We present results on several scientific systems that include Navier-Stokes fluid flow, rotating shallow water, and a high-resolution global weather forecasting system. We demonstrate that applying our design principles to prototypical neural networks leads to significantly lower errors in long-range forecasts with 800\% longer forecasts without qualitative signs of divergence compared to the original models for these systems. We open-source our \href{https://anonymous.4open.science/r/stabilizing_neural_operators-5774/}{code} for reproducibility.

Via

Access Paper or Ask Questions

Learning to Assimilate in Chaotic Dynamical Systems

Nov 01, 2021

Michael McCabe, Jed Brown

Figure 1 for Learning to Assimilate in Chaotic Dynamical Systems

Figure 2 for Learning to Assimilate in Chaotic Dynamical Systems

Figure 3 for Learning to Assimilate in Chaotic Dynamical Systems

Figure 4 for Learning to Assimilate in Chaotic Dynamical Systems

Abstract:The accuracy of simulation-based forecasting in chaotic systems is heavily dependent on high-quality estimates of the system state at the time the forecast is initialized. Data assimilation methods are used to infer these initial conditions by systematically combining noisy, incomplete observations and numerical models of system dynamics to produce effective estimation schemes. We introduce amortized assimilation, a framework for learning to assimilate in dynamical systems from sequences of noisy observations with no need for ground truth data. We motivate the framework by extending powerful results from self-supervised denoising to the dynamical systems setting through the use of differentiable simulation. Experimental results across several benchmark systems highlight the improved effectiveness of our approach over widely-used data assimilation methods.

Via

Access Paper or Ask Questions

Mapper Comparison with Wasserstein Metrics

Dec 15, 2018

Michael McCabe

Figure 1 for Mapper Comparison with Wasserstein Metrics

Figure 2 for Mapper Comparison with Wasserstein Metrics

Figure 3 for Mapper Comparison with Wasserstein Metrics

Figure 4 for Mapper Comparison with Wasserstein Metrics

Abstract:The challenge of describing model drift is an open question in unsupervised learning. It can be difficult to evaluate at what point an unsupervised model has deviated beyond what would be expected from a different sample from the same population. This is particularly true for models without a probabilistic interpretation. One such family of techniques, Topological Data Analysis, and the Mapper algorithm in particular, has found use in a variety of fields, but describing model drift for Mapper graphs is an understudied area as even existing techniques for measuring distances between related constructs like graphs or simplicial complexes fail to account for the fact that Mapper graphs represent a combination of topological, metric, and density information. In this paper, we develop an optimal transport based metric which we call the Network Augmented Wasserstein Distance for evaluating distances between Mapper graphs and demonstrate the value of the metric for model drift analysis by using the metric to transform the model drift problem into an anomaly detection problem over dynamic graphs.

Via

Access Paper or Ask Questions