Abstract:Feature selection, as a critical pre-processing step for machine learning, aims at determining representative predictors from a high-dimensional feature space dataset to improve the prediction accuracy. However, the increase in feature space dimensionality, comparing to the number of observations, poses a severe challenge to many existing feature selection methods with respect to computational efficiency and prediction performance. This paper presents a new hybrid two-layer feature selection approach that combines a wrapper and an embedded method in constructing an appropriate subset of predictors. In the first layer of the proposed method, the Genetic Algorithm(GA) has been adopted as a wrapper to search for the optimal subset of predictors, which aims to reduce the number of predictors and the prediction error. As one of the meta-heuristic approaches, GA is selected due to its computational efficiency; however, GAs do not guarantee the optimality. To address this issue, a second layer is added to the proposed method to eliminate any remaining redundant/irrelevant predictors to improve the prediction accuracy. Elastic Net(EN) has been selected as the embedded method in the second layer because of its flexibility in adjusting the penalty terms in regularization process and time efficiency. This hybrid two-layer approach has been applied on a Maize genetic dataset from NAM population, which consists of multiple subsets of datasets with different ratio of the number of predictors to the number of observations. The numerical results confirm the superiority of the proposed model.