The LSST Dark Energy Science Collaboration and the COIN collaboration
Abstract:SNAD is an international project with a primary focus on detecting astronomical anomalies within large-scale surveys, using active learning and other machine learning algorithms. The work carried out by SNAD not only contributes to the discovery and classification of various astronomical phenomena but also enhances our understanding and implementation of machine learning techniques within the field of astrophysics. This paper provides a review of the SNAD project and summarizes the advancements and achievements made by the team over several years.
Abstract:The recent increase in volume and complexity of available astronomical data has led to a wide use of supervised machine learning techniques. Active learning strategies have been proposed as an alternative to optimize the distribution of scarce labeling resources. However, due to the specific conditions in which labels can be acquired, fundamental assumptions, such as sample representativeness and labeling cost stability cannot be fulfilled. The Recommendation System for Spectroscopic follow-up (RESSPECT) project aims to enable the construction of optimized training samples for the Rubin Observatory Legacy Survey of Space and Time (LSST), taking into account a realistic description of the astronomical data environment. In this work, we test the robustness of active learning techniques in a realistic simulated astronomical data scenario. Our experiment takes into account the evolution of training and pool samples, different costs per object, and two different sources of budget. Results show that traditional active learning strategies significantly outperform random sampling. Nevertheless, more complex batch strategies are not able to significantly overcome simple uncertainty sampling techniques. Our findings illustrate three important points: 1) active learning strategies are a powerful tool to optimize the label-acquisition task in astronomy, 2) for upcoming large surveys like LSST, such techniques allow us to tailor the construction of the training sample for the first day of the survey, and 3) the peculiar data environment related to the detection of astronomical transients is a fertile ground that calls for the development of tailored machine learning algorithms.
Abstract:We present the first application of adaptive machine learning to the identification of anomalies in a data set of non-periodic astronomical light curves. The method follows an active learning strategy where highly informative objects are selected to be labelled. This new information is subsequently used to improve the machine learning model, allowing its accuracy to evolve with the addition of every new classification. For the case of anomaly detection, the algorithm aims to maximize the number of real anomalies presented to the expert by slightly modifying the decision boundary of a traditional isolation forest in each iteration. As a proof of concept, we apply the Active Anomaly Discovery (AAD) algorithm to light curves from the Open Supernova Catalog and compare its results to those of a static Isolation Forest (IF). For both methods, we visually inspected objects within 2% highest anomaly scores. We show that AAD was able to identify 80% more true anomalies than IF. This result is the first evidence that AAD algorithms can play a central role in the search for new physics in the era of large scale sky surveys.