Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sultan Hassan

Towards out-of-distribution generalization in large-scale astronomical surveys: robust networks learn similar representations

Nov 29, 2023

Yash Gondhalekar, Sultan Hassan, Naomi Saphra, Sambatra Andrianomena

Abstract:The generalization of machine learning (ML) models to out-of-distribution (OOD) examples remains a key challenge in extracting information from upcoming astronomical surveys. Interpretability approaches are a natural way to gain insights into the OOD generalization problem. We use Centered Kernel Alignment (CKA), a similarity measure metric of neural network representations, to examine the relationship between representation similarity and performance of pre-trained Convolutional Neural Networks (CNNs) on the CAMELS Multifield Dataset. We find that when models are robust to a distribution shift, they produce substantially different representations across their layers on OOD data. However, when they fail to generalize, these representations change less from layer to layer on OOD data. We discuss the potential application of similarity representation in guiding model design, training strategy, and mitigating the OOD problem by incorporating CKA as an inductive bias during training.

* Accepted to Machine Learning and the Physical Sciences Workshop, NeurIPS 2023

Via

Access Paper or Ask Questions

The CAMELS project: public data release

Jan 04, 2022

Francisco Villaescusa-Navarro, Shy Genel, Daniel Anglés-Alcázar, Lucia A. Perez, Pablo Villanueva-Domingo, Digvijay Wadekar, Helen Shao, Faizan G. Mohammad, Sultan Hassan, Emily Moser(+37 more)

Figure 1 for The CAMELS project: public data release

Figure 2 for The CAMELS project: public data release

Figure 3 for The CAMELS project: public data release

Abstract:The Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project was developed to combine cosmology with astrophysics through thousands of cosmological hydrodynamic simulations and machine learning. CAMELS contains 4,233 cosmological simulations, 2,049 N-body and 2,184 state-of-the-art hydrodynamic simulations that sample a vast volume in parameter space. In this paper we present the CAMELS public data release, describing the characteristics of the CAMELS simulations and a variety of data products generated from them, including halo, subhalo, galaxy, and void catalogues, power spectra, bispectra, Lyman-$\alpha$ spectra, probability distribution functions, halo radial profiles, and X-rays photon lists. We also release over one thousand catalogues that contain billions of galaxies from CAMELS-SAM: a large collection of N-body simulations that have been combined with the Santa Cruz Semi-Analytic Model. We release all the data, comprising more than 350 terabytes and containing 143,922 snapshots, millions of halos, galaxies and summary statistics. We provide further technical details on how to access, download, read, and process the data at \url{https://camels.readthedocs.io}.

* 18 pages, 3 figures. More than 350 Tb of data from thousands of simulations publicly available at https://www.camel-simulations.org

Via

Access Paper or Ask Questions

The CAMELS Multifield Dataset: Learning the Universe's Fundamental Parameters with Artificial Intelligence

Sep 22, 2021

Francisco Villaescusa-Navarro, Shy Genel, Daniel Angles-Alcazar, Leander Thiele, Romeel Dave, Desika Narayanan, Andrina Nicola, Yin Li, Pablo Villanueva-Domingo, Benjamin Wandelt(+18 more)

Figure 1 for The CAMELS Multifield Dataset: Learning the Universe's Fundamental Parameters with Artificial Intelligence

Figure 2 for The CAMELS Multifield Dataset: Learning the Universe's Fundamental Parameters with Artificial Intelligence

Figure 3 for The CAMELS Multifield Dataset: Learning the Universe's Fundamental Parameters with Artificial Intelligence

Abstract:We present the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) Multifield Dataset, CMD, a collection of hundreds of thousands of 2D maps and 3D grids containing many different properties of cosmic gas, dark matter, and stars from 2,000 distinct simulated universes at several cosmic times. The 2D maps and 3D grids represent cosmic regions that span $\sim$100 million light years and have been generated from thousands of state-of-the-art hydrodynamic and gravity-only N-body simulations from the CAMELS project. Designed to train machine learning models, CMD is the largest dataset of its kind containing more than 70 Terabytes of data. In this paper we describe CMD in detail and outline a few of its applications. We focus our attention on one such task, parameter inference, formulating the problems we face as a challenge to the community. We release all data and provide further technical details at https://camels-multifield-dataset.readthedocs.io.

* 17 pages, 1 figure. Third paper of a series of four. Hundreds of thousands of labeled 2D maps and 3D grids from thousands of simulated universes publicly available at https://camels-multifield-dataset.readthedocs.io

Via

Access Paper or Ask Questions

Robust marginalization of baryonic effects for cosmological inference at the field level

Sep 21, 2021

Francisco Villaescusa-Navarro, Shy Genel, Daniel Angles-Alcazar, David N. Spergel, Yin Li, Benjamin Wandelt, Leander Thiele, Andrina Nicola, Jose Manuel Zorrilla Matilla, Helen Shao(+4 more)

Figure 1 for Robust marginalization of baryonic effects for cosmological inference at the field level

Figure 2 for Robust marginalization of baryonic effects for cosmological inference at the field level

Figure 3 for Robust marginalization of baryonic effects for cosmological inference at the field level

Figure 4 for Robust marginalization of baryonic effects for cosmological inference at the field level

Abstract:We train neural networks to perform likelihood-free inference from $(25\,h^{-1}{\rm Mpc})^2$ 2D maps containing the total mass surface density from thousands of hydrodynamic simulations of the CAMELS project. We show that the networks can extract information beyond one-point functions and power spectra from all resolved scales ($\gtrsim 100\,h^{-1}{\rm kpc}$) while performing a robust marginalization over baryonic physics at the field level: the model can infer the value of $\Omega_{\rm m} (\pm 4\%)$ and $\sigma_8 (\pm 2.5\%)$ from simulations completely different to the ones used to train it.

* 7 pages, 4 figures. Second paper of a series of four. The 2D maps, codes, and network weights used in this paper are publicly available at https://camels-multifield-dataset.readthedocs.io

Via

Access Paper or Ask Questions

Multifield Cosmology with Artificial Intelligence

Sep 20, 2021

Francisco Villaescusa-Navarro, Daniel Anglés-Alcázar, Shy Genel, David N. Spergel, Yin Li, Benjamin Wandelt, Andrina Nicola, Leander Thiele, Sultan Hassan, Jose Manuel Zorrilla Matilla(+3 more)

Figure 1 for Multifield Cosmology with Artificial Intelligence

Figure 2 for Multifield Cosmology with Artificial Intelligence

Figure 3 for Multifield Cosmology with Artificial Intelligence

Figure 4 for Multifield Cosmology with Artificial Intelligence

Abstract:Astrophysical processes such as feedback from supernovae and active galactic nuclei modify the properties and spatial distribution of dark matter, gas, and galaxies in a poorly understood way. This uncertainty is one of the main theoretical obstacles to extract information from cosmological surveys. We use 2,000 state-of-the-art hydrodynamic simulations from the CAMELS project spanning a wide variety of cosmological and astrophysical models and generate hundreds of thousands of 2-dimensional maps for 13 different fields: from dark matter to gas and stellar properties. We use these maps to train convolutional neural networks to extract the maximum amount of cosmological information while marginalizing over astrophysical effects at the field level. Although our maps only cover a small area of $(25~h^{-1}{\rm Mpc})^2$, and the different fields are contaminated by astrophysical effects in very different ways, our networks can infer the values of $\Omega_{\rm m}$ and $\sigma_8$ with a few percent level precision for most of the fields. We find that the marginalization performed by the network retains a wealth of cosmological information compared to a model trained on maps from gravity-only N-body simulations that are not contaminated by astrophysical effects. Finally, we train our networks on multifields -- 2D maps that contain several fields as different colors or channels -- and find that not only they can infer the value of all parameters with higher accuracy than networks trained on individual fields, but they can constrain the value of $\Omega_{\rm m}$ with higher accuracy than the maps from the N-body simulations.

* 11 pages, 7 figures. First paper of a series of four. All 2D maps, codes, and networks weights publicly available at https://camels-multifield-dataset.readthedocs.io

Via

Access Paper or Ask Questions

Hybrid analytic and machine-learned baryonic property insertion into galactic dark matter haloes

Dec 10, 2020

Ben Moews, Romeel Davé, Sourav Mitra, Sultan Hassan, Weiguang Cui

Figure 1 for Hybrid analytic and machine-learned baryonic property insertion into galactic dark matter haloes

Figure 2 for Hybrid analytic and machine-learned baryonic property insertion into galactic dark matter haloes

Figure 3 for Hybrid analytic and machine-learned baryonic property insertion into galactic dark matter haloes

Figure 4 for Hybrid analytic and machine-learned baryonic property insertion into galactic dark matter haloes

Abstract:While cosmological dark matter-only simulations relying solely on gravitational effects are comparably fast to compute, baryonic properties in simulated galaxies require complex hydrodynamic simulations that are computationally costly to run. We explore the merging of an extended version of the equilibrium model, an analytic formalism describing the evolution of the stellar, gas, and metal content of galaxies, into a machine learning framework. In doing so, we are able to recover more properties than the analytic formalism alone can provide, creating a high-speed hydrodynamic simulation emulator that populates galactic dark matter haloes in N-body simulations with baryonic properties. While there exists a trade-off between the reached accuracy and the speed advantage this approach offers, our results outperform an approach using only machine learning for a subset of baryonic properties. We demonstrate that this novel hybrid system enables the fast completion of dark matter-only information by mimicking the properties of a full hydrodynamic suite to a reasonable degree, and discuss the advantages and disadvantages of hybrid versus machine learning-only frameworks. In doing so, we offer an acceleration of commonly deployed simulations in cosmology.

* 13 pages, 8 figures, preprint submitted to MNRAS

Via

Access Paper or Ask Questions