Abstract:Accurate redshift estimates are a vital component in understanding galaxy evolution and precision cosmology. In this paper, we explore approaches to increase the applicability of machine learning models for photometric redshift estimation on a broader range of galaxy types. Typical models are trained with ground-truth redshifts from spectroscopy. We test the utility and effectiveness of two approaches for combining spectroscopic redshifts and redshifts derived from multiband ($\sim$35 filters) photometry, which sample different types of galaxies compared to spectroscopic surveys. The two approaches are (1) training on a composite dataset and (2) transfer learning from one dataset to another. We compile photometric redshifts from the COSMOS2020 catalog (TransferZ) to complement an established spectroscopic redshift dataset (GalaxiesML). We used two architectures, deterministic neural networks (NN) and Bayesian neural networks (BNN), to examine and evaluate their performance with respect to the Legacy Survey of Space and Time (LSST) photo-$z$ science requirements. We also use split conformal prediction for calibrating uncertainty estimates and producing prediction intervals for the BNN and NN, respectively. We find that a NN trained on a composite dataset predicts photo-$z$'s that are 4.5 times less biased within the redshift range $0.3<z<1.5$, 1.1 times less scattered, and has a 1.4 times lower outlier rate than a model trained on only spectroscopic ground truths. We also find that BNNs produce reliable uncertainty estimates, but are sensitive to the different ground truths. This investigation leverages different sources of ground truths to develop models that can accurately predict photo-$z$'s for a broader population of galaxies crucial for surveys such as Euclid and LSST.




Abstract:In this work, we explore methods to improve galaxy redshift predictions by combining different ground truths. Traditional machine learning models rely on training sets with known spectroscopic redshifts, which are precise but only represent a limited sample of galaxies. To make redshift models more generalizable to the broader galaxy population, we investigate transfer learning and directly combining ground truth redshifts derived from photometry and spectroscopy. We use the COSMOS2020 survey to create a dataset, TransferZ, which includes photometric redshift estimates derived from up to 35 imaging filters using template fitting. This dataset spans a wider range of galaxy types and colors compared to spectroscopic samples, though its redshift estimates are less accurate. We first train a base neural network on TransferZ and then refine it using transfer learning on a dataset of galaxies with more precise spectroscopic redshifts (GalaxiesML). In addition, we train a neural network on a combined dataset of TransferZ and GalaxiesML. Both methods reduce bias by $\sim$ 5x, RMS error by $\sim$ 1.5x, and catastrophic outlier rates by 1.3x on GalaxiesML, compared to a baseline trained only on TransferZ. However, we also find a reduction in performance for RMS and bias when evaluated on TransferZ data. Overall, our results demonstrate these approaches can meet cosmological requirements.