Abstract:With the development of a series of Galaxy sky surveys in recent years, the observations increased rapidly, which makes the research of machine learning methods for galaxy image recognition a hot topic. Available automatic galaxy image recognition researches are plagued by the large differences in similarity between categories, the imbalance of data between different classes, and the discrepancy between the discrete representation of Galaxy classes and the essentially gradual changes from one morphological class to the adjacent class (DDRGC). These limitations have motivated several astronomers and machine learning experts to design projects with improved galaxy image recognition capabilities. Therefore, this paper proposes a novel learning method, ``Hierarchical Imbalanced data learning with Weighted sampling and Label smoothing" (HIWL). The HIWL consists of three key techniques respectively dealing with the above-mentioned three problems: (1) Designed a hierarchical galaxy classification model based on an efficient backbone network; (2) Utilized a weighted sampling scheme to deal with the imbalance problem; (3) Adopted a label smoothing technique to alleviate the DDRGC problem. We applied this method to galaxy photometric images from the Galaxy Zoo-The Galaxy Challenge, exploring the recognition of completely round smooth, in between smooth, cigar-shaped, edge-on and spiral. The overall classification accuracy is 96.32\%, and some superiorities of the HIWL are shown based on recall, precision, and F1-Score in comparing with some related works. In addition, we also explored the visualization of the galaxy image features and model attention to understand the foundations of the proposed scheme.
Abstract:We describe a scheme to extract linearly supporting (LSU) features from stellar spectra to automatically estimate the atmospheric parameters $T_{eff}$, log$~g$, and [Fe/H]. "Linearly supporting" means that the atmospheric parameters can be accurately estimated from the extracted features through a linear model. The successive steps of the process are as follow: first, decompose the spectrum using a wavelet packet (WP) and represent it by the derived decomposition coefficients; second, detect representative spectral features from the decomposition coefficients using the proposed method Least Absolute Shrinkage and Selection Operator (LARS)$_{bs}$; third, estimate the atmospheric parameters $T_{eff}$, log$~g$, and [Fe/H] from the detected features using a linear regression method. One prominent characteristic of this scheme is its ability to evaluate quantitatively the contribution of each detected feature to the atmospheric parameter estimate and also to trace back the physical significance of that feature. This work also shows that the usefulness of a component depends on both wavelength and frequency. The proposed scheme has been evaluated on both real spectra from the Sloan Digital Sky Survey (SDSS)/SEGUE and synthetic spectra calculated from Kurucz's NEWODF models. On real spectra, we extracted 23 features to estimate $T_{eff}$, 62 features for log$~g$, and 68 features for [Fe/H]. Test consistencies between our estimates and those provided by the Spectroscopic Sarameter Pipeline of SDSS show that the mean absolute errors (MAEs) are 0.0062 dex for log$~T_{eff}$ (83 K for $T_{eff}$), 0.2345 dex for log$~g$, and 0.1564 dex for [Fe/H]. For the synthetic spectra, the MAE test accuracies are 0.0022 dex for log$~T_{eff}$ (32 K for $T_{eff}$), 0.0337 dex for log$~g$, and 0.0268 dex for [Fe/H].