Abstract:While machine-learned models are now routinely employed to facilitate astronomical inquiry, model inputs tend to be limited to a primary data source (namely images or time series) and, in the more advanced approaches, some metadata. Yet with the growing use of wide-field, multiplexed observational resources, individual sources of interest often have a broad range of observational modes available. Here we construct an astronomical multimodal dataset and propose AstroM$^3$, a self-supervised pre-training approach that enables a model to learn from multiple modalities simultaneously. Specifically, we extend the CLIP (Contrastive Language-Image Pretraining) model to a trimodal setting, allowing the integration of time-series photometry data, spectra, and astrophysical metadata. In a fine-tuning supervised setting, our results demonstrate that CLIP pre-training improves classification performance for time-series photometry, where accuracy increases from 84.6% to 91.5%. Furthermore, CLIP boosts classification accuracy by up to 12.6% when the availability of labeled data is limited, showing the effectiveness of leveraging larger corpora of unlabeled data. In addition to fine-tuned classification, we can use the trained model in other downstream tasks that are not explicitly contemplated during the construction of the self-supervised model. In particular we show the efficacy of using the learned embeddings for misclassifications identification, similarity search, and anomaly detection. One surprising highlight is the "rediscovery" of Mira subtypes and two Rotational variable subclasses using manifold learning and dimension reduction algorithm. To our knowledge this is the first construction of an $n>2$ mode model in astronomy. Extensions to $n>3$ modes is naturally anticipated with this approach.
Abstract:Ragnarock is a virtual reality (VR) rhythm game in which you play a Viking captain competing in a longship race. With two hammers, the task is to crush the incoming runes in sync with epic Viking music. The runes are defined by a beat map which the player can manually create. The creation of beat maps takes hours. This work aims to automate the process of beat map creation, also known as the task of learning to choreograph. The assignment is broken down into three parts: determining the timing of the beats (action placement), determining where in space the runes connected with the chosen beats should be placed (action selection) and web-application creation. For the first task of action placement, extraction of predominant local pulse (PLP) information from music recordings is used. This approach allows to learn where and how many beats are supposed to be placed. For the second task of action selection, Recurrent Neural Networks (RNN) are used, specifically Gated recurrent unit (GRU) to learn sequences of beats, and their patterns to be able to recreate those rules and receive completely new levels. Then the last task was to build a solution for non-technical players, the task was to combine the results of the first and the second parts into a web application for easy use. For this task the frontend was built using JavaScript and React and the backend - python and FastAPI.