Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Carolina Cuesta-Lazaro

Center for Astrophysics | Harvard & Smithsonian

CosmoFlow: Scale-Aware Representation Learning for Cosmology with Flow Matching

Jul 16, 2025

Sidharth Kannan, Tian Qiu, Carolina Cuesta-Lazaro, Haewon Jeong

Abstract:Generative machine learning models have been demonstrated to be able to learn low dimensional representations of data that preserve information required for downstream tasks. In this work, we demonstrate that flow matching based generative models can learn compact, semantically rich latent representations of field level cold dark matter (CDM) simulation data without supervision. Our model, CosmoFlow, learns representations 32x smaller than the raw field data, usable for field level reconstruction, synthetic data generation, and parameter inference. Our model also learns interpretable representations, in which different latent channels correspond to features at different cosmological scales.

Via

Access Paper or Ask Questions

A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing

Oct 27, 2024

Julia Balla, Siddharth Mishra-Sharma, Carolina Cuesta-Lazaro, Tommi Jaakkola, Tess Smidt

Figure 1 for A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing

Figure 2 for A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing

Figure 3 for A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing

Figure 4 for A Cosmic-Scale Benchmark for Symmetry-Preserving Data Processing

Abstract:Efficiently processing structured point cloud data while preserving multiscale information is a key challenge across domains, from graphics to atomistic modeling. Using a curated dataset of simulated galaxy positions and properties, represented as point clouds, we benchmark the ability of graph neural networks to simultaneously capture local clustering environments and long-range correlations. Given the homogeneous and isotropic nature of the Universe, the data exhibits a high degree of symmetry. We therefore focus on evaluating the performance of Euclidean symmetry-preserving ($E(3)$-equivariant) graph neural networks, showing that they can outperform non-equivariant counterparts and domain-specific information extraction techniques in downstream performance as well as simulation-efficiency. However, we find that current architectures fail to capture information from long-range correlations as effectively as domain-specific baselines, motivating future work on architectures better suited for extracting long-range information.

* 19 pages, 3 figures; To appear at the NeurReps Workshop @ NeurIPS 2024

Via

Access Paper or Ask Questions

How DREAMS are made: Emulating Satellite Galaxy and Subhalo Populations with Diffusion Models and Point Clouds

Sep 04, 2024

Tri Nguyen, Francisco Villaescusa-Navarro, Siddharth Mishra-Sharma, Carolina Cuesta-Lazaro, Paul Torrey, Arya Farahi, Alex M. Garcia, Jonah C. Rose, Stephanie O'Neil, Mark Vogelsberger(+9 more)

Figure 1 for How DREAMS are made: Emulating Satellite Galaxy and Subhalo Populations with Diffusion Models and Point Clouds

Figure 2 for How DREAMS are made: Emulating Satellite Galaxy and Subhalo Populations with Diffusion Models and Point Clouds

Figure 3 for How DREAMS are made: Emulating Satellite Galaxy and Subhalo Populations with Diffusion Models and Point Clouds

Figure 4 for How DREAMS are made: Emulating Satellite Galaxy and Subhalo Populations with Diffusion Models and Point Clouds

Abstract:The connection between galaxies and their host dark matter (DM) halos is critical to our understanding of cosmology, galaxy formation, and DM physics. To maximize the return of upcoming cosmological surveys, we need an accurate way to model this complex relationship. Many techniques have been developed to model this connection, from Halo Occupation Distribution (HOD) to empirical and semi-analytic models to hydrodynamic. Hydrodynamic simulations can incorporate more detailed astrophysical processes but are computationally expensive; HODs, on the other hand, are computationally cheap but have limited accuracy. In this work, we present NeHOD, a generative framework based on variational diffusion model and Transformer, for painting galaxies/subhalos on top of DM with an accuracy of hydrodynamic simulations but at a computational cost similar to HOD. By modeling galaxies/subhalos as point clouds, instead of binning or voxelization, we can resolve small spatial scales down to the resolution of the simulations. For each halo, NeHOD predicts the positions, velocities, masses, and concentrations of its central and satellite galaxies. We train NeHOD on the TNG-Warm DM suite of the DREAMS project, which consists of 1024 high-resolution zoom-in hydrodynamic simulations of Milky Way-mass halos with varying warm DM mass and astrophysical parameters. We show that our model captures the complex relationships between subhalo properties as a function of the simulation parameters, including the mass functions, stellar-halo mass relations, concentration-mass relations, and spatial clustering. Our method can be used for a large variety of downstream applications, from galaxy clustering to strong lensing studies.

* Submitted to ApJ; 30 + 6 pages; 11 + 4 figures; Comments welcomed

Via

Access Paper or Ask Questions

Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo

May 08, 2024

Nayantara Mudur, Carolina Cuesta-Lazaro, Douglas P. Finkbeiner

Figure 1 for Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo

Figure 2 for Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo

Figure 3 for Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo

Figure 4 for Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo

Abstract:Diffusion generative models have excelled at diverse image generation and reconstruction tasks across fields. A less explored avenue is their application to discriminative tasks involving regression or classification problems. The cornerstone of modern cosmology is the ability to generate predictions for observed astrophysical fields from theory and constrain physical models from observations using these predictions. This work uses a single diffusion generative model to address these interlinked objectives -- as a surrogate model or emulator for cold dark matter density fields conditional on input cosmological parameters, and as a parameter inference model that solves the inverse problem of constraining the cosmological parameters of an input field. The model is able to emulate fields with summary statistics consistent with those of the simulated target distribution. We then leverage the approximate likelihood of the diffusion generative model to derive tight constraints on cosmology by using the Hamiltonian Monte Carlo method to sample the posterior on cosmological parameters for a given test image. Finally, we demonstrate that this parameter inference approach is more robust to the addition of noise than baseline parameter inference networks.

* 14 pages, 10 figures

Via

Access Paper or Ask Questions

LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology

Feb 06, 2024

Matthew Ho, Deaglan J. Bartlett, Nicolas Chartier, Carolina Cuesta-Lazaro, Simon Ding, Axel Lapel, Pablo Lemos, Christopher C. Lovell, T. Lucas Makinen, Chirag Modi(+5 more)

Figure 1 for LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology

Figure 2 for LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology

Figure 3 for LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology

Figure 4 for LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology

Abstract:This paper presents the Learning the Universe Implicit Likelihood Inference (LtU-ILI) pipeline, a codebase for rapid, user-friendly, and cutting-edge machine learning (ML) inference in astrophysics and cosmology. The pipeline includes software for implementing various neural architectures, training schema, priors, and density estimators in a manner easily adaptable to any research workflow. It includes comprehensive validation metrics to assess posterior estimate coverage, enhancing the reliability of inferred results. Additionally, the pipeline is easily parallelizable, designed for efficient exploration of modeling hyperparameters. To demonstrate its capabilities, we present real applications across a range of astrophysics and cosmology problems, such as: estimating galaxy cluster masses from X-ray photometry; inferring cosmology from matter power spectra and halo point clouds; characterising progenitors in gravitational wave signals; capturing physical dust parameters from galaxy colors and luminosities; and establishing properties of semi-analytic models of galaxy formation. We also include exhaustive benchmarking and comparisons of all implemented methods as well as discussions about the challenges and pitfalls of ML inference in astronomical sciences. All code and examples are made publicly available at https://github.com/maho3/ltu-ili.

* 20 pages, 10 figures, submitted to the Open Journal of Astrophysics. Code available at https://github.com/maho3/ltu-ili

Via

Access Paper or Ask Questions

Cosmological Field Emulation and Parameter Inference with Diffusion Models

Dec 12, 2023

Nayantara Mudur, Carolina Cuesta-Lazaro, Douglas P. Finkbeiner

Abstract:Cosmological simulations play a crucial role in elucidating the effect of physical parameters on the statistics of fields and on constraining parameters given information on density fields. We leverage diffusion generative models to address two tasks of importance to cosmology -- as an emulator for cold dark matter density fields conditional on input cosmological parameters $\Omega_m$ and $\sigma_8$, and as a parameter inference model that can return constraints on the cosmological parameters of an input field. We show that the model is able to generate fields with power spectra that are consistent with those of the simulated target distribution, and capture the subtle effect of each parameter on modulations in the power spectrum. We additionally explore their utility as parameter inference models and find that we can obtain tight constraints on cosmological parameters.

* 7 pages, 5 figures, Accepted at the Machine Learning and the Physical Sciences workshop, NeurIPS 2023

Via

Access Paper or Ask Questions

A point cloud approach to generative modeling for galaxy surveys at the field level

Nov 28, 2023

Carolina Cuesta-Lazaro, Siddharth Mishra-Sharma

Abstract:We introduce a diffusion-based generative model to describe the distribution of galaxies in our Universe directly as a collection of points in 3-D space (coordinates) optionally with associated attributes (e.g., velocities and masses), without resorting to binning or voxelization. The custom diffusion model can be used both for emulation, reproducing essential summary statistics of the galaxy distribution, as well as inference, by computing the conditional likelihood of a galaxy field. We demonstrate a first application to massive dark matter haloes in the Quijote simulation suite. This approach can be extended to enable a comprehensive analysis of cosmological data, circumventing limitations inherent to summary statistic -- as well as neural simulation-based inference methods.

* 15+3 pages, 7+4 figures

Via

Access Paper or Ask Questions

Probabilistic reconstruction of Dark Matter fields from biased tracers using diffusion models

Nov 14, 2023

Core Francisco Park, Victoria Ono, Nayantara Mudur, Yueying Ni, Carolina Cuesta-Lazaro

Figure 1 for Probabilistic reconstruction of Dark Matter fields from biased tracers using diffusion models

Figure 2 for Probabilistic reconstruction of Dark Matter fields from biased tracers using diffusion models

Figure 3 for Probabilistic reconstruction of Dark Matter fields from biased tracers using diffusion models

Figure 4 for Probabilistic reconstruction of Dark Matter fields from biased tracers using diffusion models

Abstract:Galaxies are biased tracers of the underlying cosmic web, which is dominated by dark matter components that cannot be directly observed. The relationship between dark matter density fields and galaxy distributions can be sensitive to assumptions in cosmology and astrophysical processes embedded in the galaxy formation models, that remain uncertain in many aspects. Based on state-of-the-art galaxy formation simulation suites with varied cosmological parameters and sub-grid astrophysics, we develop a diffusion generative model to predict the unbiased posterior distribution of the underlying dark matter fields from the given stellar mass fields, while being able to marginalize over the uncertainties in cosmology and galaxy formation.

Via

Access Paper or Ask Questions

Simulation-based Inference for Exoplanet Atmospheric Retrieval: Insights from winning the Ariel Data Challenge 2023 using Normalizing Flows

Sep 17, 2023

Mayeul Aubin, Carolina Cuesta-Lazaro, Ethan Tregidga, Javier Viaña, Cecilia Garraffo, Iouli E. Gordon, Mercedes López-Morales, Robert J. Hargreaves, Vladimir Yu. Makhnev, Jeremy J. Drake(+2 more)

Abstract:Advancements in space telescopes have opened new avenues for gathering vast amounts of data on exoplanet atmosphere spectra. However, accurately extracting chemical and physical properties from these spectra poses significant challenges due to the non-linear nature of the underlying physics. This paper presents novel machine learning models developed by the AstroAI team for the Ariel Data Challenge 2023, where one of the models secured the top position among 293 competitors. Leveraging Normalizing Flows, our models predict the posterior probability distribution of atmospheric parameters under different atmospheric assumptions. Moreover, we introduce an alternative model that exhibits higher performance potential than the winning model, despite scoring lower in the challenge. These findings highlight the need to reevaluate the evaluation metric and prompt further exploration of more efficient and accurate approaches for exoplanet atmosphere spectra analysis. Finally, we present recommendations to enhance the challenge and models, providing valuable insights for future applications on real observational data. These advancements pave the way for more effective and timely analysis of exoplanet atmospheric properties, advancing our understanding of these distant worlds.

* Conference proceeding for the ECML PKDD 2023

Via

Access Paper or Ask Questions

XNet: A convolutional neural network (CNN) implementation for medical X-Ray image segmentation suitable for small datasets

Dec 03, 2018

Joseph Bullock, Carolina Cuesta-Lazaro, Arnau Quera-Bofarull

Abstract:X-Ray image enhancement, along with many other medical image processing applications, requires the segmentation of images into bone, soft tissue, and open beam regions. We apply a machine learning approach to this problem, presenting an end-to-end solution which results in robust and efficient inference. Since medical institutions frequently do not have the resources to process and label the large quantity of X-Ray images usually needed for neural network training, we design an end-to-end solution for small datasets, while achieving state-of-the-art results. Our implementation produces an overall accuracy of 92%, F1 score of 0.92, and an AUC of 0.98, surpassing classical image processing techniques, such as clustering and entropy based methods, while improving upon the output of existing neural networks used for segmentation in non-medical contexts. The code used for this project is available online.

* 11 pages, 5 figures, 2 tables, accepted for SPIE Proceedings, Medical Imaging 2018: Biomedical Applications in Molecular, Structural, and Functional Imaging

Via

Access Paper or Ask Questions