Abstract:The Electron-Ion Collider (EIC), a state-of-the-art facility for studying the strong force, is expected to begin commissioning its first experiments in 2028. This is an opportune time for artificial intelligence (AI) to be included from the start at this facility and in all phases that lead up to the experiments. The second annual workshop organized by the AI4EIC working group, which recently took place, centered on exploring all current and prospective application areas of AI for the EIC. This workshop is not only beneficial for the EIC, but also provides valuable insights for the newly established ePIC collaboration at EIC. This paper summarizes the different activities and R&D projects covered across the sessions of the workshop and provides an overview of the goals, approaches and strategies regarding AI/ML in the EIC community, as well as cutting-edge techniques currently studied in other experiments.
Abstract:Artificial intelligence methods show great promise in increasing the quality and speed of work with large astronomical datasets, but the high complexity of these methods leads to the extraction of dataset-specific, non-robust features. Therefore, such methods do not generalize well across multiple datasets. We present a universal domain adaptation method, \textit{DeepAstroUDA}, as an approach to overcome this challenge. This algorithm performs semi-supervised domain adaptation and can be applied to datasets with different data distributions and class overlaps. Non-overlapping classes can be present in any of the two datasets (the labeled source domain, or the unlabeled target domain), and the method can even be used in the presence of unknown classes. We apply our method to three examples of galaxy morphology classification tasks of different complexities ($3$-class and $10$-class problems), with anomaly detection: 1) datasets created after different numbers of observing years from a single survey (LSST mock data of $1$ and $10$ years of observations); 2) data from different surveys (SDSS and DECaLS); and 3) data from observing fields with different depths within one survey (wide field and Stripe 82 deep field of SDSS). For the first time, we demonstrate the successful use of domain adaptation between very discrepant observational datasets. \textit{DeepAstroUDA} is capable of bridging the gap between two astronomical surveys, increasing classification accuracy in both domains (up to $40\%$ on the unlabeled data), and making model performance consistent across datasets. Furthermore, our method also performs well as an anomaly detection algorithm and successfully clusters unknown class samples even in the unlabeled target dataset.
Abstract:Deep learning models are being increasingly adopted in wide array of scientific domains, especially to handle high-dimensionality and volume of the scientific data. However, these models tend to be brittle due to their complexity and overparametrization, especially to the inadvertent adversarial perturbations that can appear due to common image processing such as compression or blurring that are often seen with real scientific data. It is crucial to understand this brittleness and develop models robust to these adversarial perturbations. To this end, we study the effect of observational noise from the exposure time, as well as the worst case scenario of a one-pixel attack as a proxy for compression or telescope errors on performance of ResNet18 trained to distinguish between galaxies of different morphologies in LSST mock data. We also explore how domain adaptation techniques can help improve model robustness in case of this type of naturally occurring attacks and help scientists build more trustworthy and stable models.
Abstract:In astronomy, neural networks are often trained on simulation data with the prospect of being used on telescope observations. Unfortunately, training a model on simulation data and then applying it to instrument data leads to a substantial and potentially even detrimental decrease in model accuracy on the new target dataset. Simulated and instrument data represent different data domains, and for an algorithm to work in both, domain-invariant learning is necessary. Here we employ domain adaptation techniques$-$ Maximum Mean Discrepancy (MMD) as an additional transfer loss and Domain Adversarial Neural Networks (DANNs)$-$ and demonstrate their viability to extract domain-invariant features within the astronomical context of classifying merging and non-merging galaxies. Additionally, we explore the use of Fisher loss and entropy minimization to enforce better in-domain class discriminability. We show that the addition of each domain adaptation technique improves the performance of a classifier when compared to conventional deep learning algorithms. We demonstrate this on two examples: between two Illustris-1 simulated datasets of distant merging galaxies, and between Illustris-1 simulated data of nearby merging galaxies and observed data from the Sloan Digital Sky Survey. The use of domain adaptation techniques in our experiments leads to an increase of target domain classification accuracy of up to ${\sim}20\%$. With further development, these techniques will allow astronomers to successfully implement neural network models trained on simulation data to efficiently detect and study astrophysical objects in current and future large-scale astronomical surveys.
Abstract:In astronomy, neural networks are often trained on simulated data with the prospect of being applied to real observations. Unfortunately, simply training a deep neural network on images from one domain does not guarantee satisfactory performance on new images from a different domain. The ability to share cross-domain knowledge is the main advantage of modern deep domain adaptation techniques. Here we demonstrate the use of two techniques - Maximum Mean Discrepancy (MMD) and adversarial training with Domain Adversarial Neural Networks (DANN) - for the classification of distant galaxy mergers from the Illustris-1 simulation, where the two domains presented differ only due to inclusion of observational noise. We show how the addition of either MMD or adversarial training greatly improves the performance of the classifier on the target domain when compared to conventional machine learning algorithms, thereby demonstrating great promise for their use in astronomy.
Abstract:We present a response to the 2018 Request for Information (RFI) from the NITRD, NCO, NSF regarding the "Update to the 2016 National Artificial Intelligence Research and Development Strategic Plan." Through this document, we provide a response to the question of whether and how the National Artificial Intelligence Research and Development Strategic Plan (NAIRDSP) should be updated from the perspective of Fermilab, America's premier national laboratory for High Energy Physics (HEP). We believe the NAIRDSP should be extended in light of the rapid pace of development and innovation in the field of Artificial Intelligence (AI) since 2016, and present our recommendations below. AI has profoundly impacted many areas of human life, promising to dramatically reshape society --- e.g., economy, education, science --- in the coming years. We are still early in this process. It is critical to invest now in this technology to ensure it is safe and deployed ethically. Science and society both have a strong need for accuracy, efficiency, transparency, and accountability in algorithms, making investments in scientific AI particularly valuable. Thus far the US has been a leader in AI technologies, and we believe as a national Laboratory it is crucial to help maintain and extend this leadership. Moreover, investments in AI will be important for maintaining US leadership in the physical sciences.