Abstract:The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe. However, effectively analyzing this vast amount of data poses a significant challenge. Astronomers are turning to deep learning techniques to address this, but the methods are limited by their specific training sets, leading to considerable duplicate workloads too. Hence, as an example to present how to overcome the issue, we built a framework for general analysis of galaxy images, based on a large vision model (LVM) plus downstream tasks (DST), including galaxy morphological classification, image restoration, object detection, parameter extraction, and more. Considering the low signal-to-noise ratio of galaxy images and the imbalanced distribution of galaxy categories, we have incorporated a Human-in-the-loop (HITL) module into our large vision model, which leverages human knowledge to enhance the reliability and interpretability of processing galaxy images interactively. The proposed framework exhibits notable few-shot learning capabilities and versatile adaptability to all the abovementioned tasks on galaxy images in the DESI legacy imaging surveys. Expressly, for object detection, trained by 1000 data points, our DST upon the LVM achieves an accuracy of 96.7%, while ResNet50 plus Mask R-CNN gives an accuracy of 93.1%; for morphology classification, to obtain AUC ~0.9, LVM plus DST and HITL only requests 1/50 training sets compared to ResNet18. Expectedly, multimodal data can be integrated similarly, which opens up possibilities for conducting joint analyses with datasets spanning diverse domains in the era of multi-message astronomy.
Abstract:In open-set recognition, existing methods generally learn statically fixed decision boundaries using known classes to reject unknown classes. Though they have achieved promising results, such decision boundaries are evidently insufficient for universal unknown classes in dynamic and open scenarios as they can potentially appear at any position in the feature space. Moreover, these methods just simply reject unknown class samples during testing without any effective utilization for them. In fact, such samples completely can constitute the true instantiated representation of the unknown classes to further enhance the model's performance. To address these issues, this paper proposes a novel dynamic against dynamic idea, i.e., dynamic method against dynamic changing open-set world, where an open-set self-learning (OSSL) framework is correspondingly developed. OSSL starts with a good closed-set classifier trained by known classes and utilizes available test samples for model adaptation during testing, thus gaining the adaptability to changing data distributions. In particular, a novel self-matching module is designed for OSSL, which can achieve the adaptation in automatically identifying known class samples while rejecting unknown class samples which are further utilized to enhance the discriminability of the model as the instantiated representation of unknown classes. Our method establishes new performance milestones respectively in almost all standard and cross-data benchmarks.
Abstract:In big data era, the special data with rare characteristics may be of great significations. However, it is very difficult to automatically search these samples from the massive and high-dimensional datasets and systematically evaluate them. The DoPS, our previous work [2], provided a search method of rare spectra with double-peaked profiles from massive and high-dimensional data of LAMOST survey. The identification of the results is mainly depended on visually inspection by astronomers. In this paper, as a follow-up study, a new lattice structure named SVM-Lattice is designed based on SVM(Support Vector Machine) and FCL(Formal Concept Lattice) and particularly applied in the recognition and evaluation of rare spectra with double-peaked profiles. First, each node in the SVM-Lattice structure contains two components: the intents are defined by the support vectors trained by the spectral samples with the specific characteristics, and the relevant extents are all the positive samples classified by the support vectors. The hyperplanes can be extracted from every lattice node and used as classifiers to search targets by categories. A generalization and specialization relationship is expressed between the layers, and higher layers indicate higher confidence of targets. Then, including a SVM-Lattice building algorithm, a pruning algorithm based on association rules, and an evaluation algorithm, the supporting algorithms are provided and analysed. Finally, for the recognition and evaluation of spectra with double-peaked profiles, several data sets from LAMOST survey are used as experimental dataset. The results exhibit good consistency with traditional methods, more detailed and accurate evaluations of classification results, and higher searching efficiency than other similar methods.