Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vural Aksakalli

SPSA-FSR: Simultaneous Perturbation Stochastic Approximation for Feature Selection and Ranking

Apr 16, 2018

Zeren D. Yenice, Niranjan Adhikari, Yong Kai Wong, Vural Aksakalli, Alev Taskin Gumus, Babak Abbasi

Figure 1 for SPSA-FSR: Simultaneous Perturbation Stochastic Approximation for Feature Selection and Ranking

Figure 2 for SPSA-FSR: Simultaneous Perturbation Stochastic Approximation for Feature Selection and Ranking

Figure 3 for SPSA-FSR: Simultaneous Perturbation Stochastic Approximation for Feature Selection and Ranking

Figure 4 for SPSA-FSR: Simultaneous Perturbation Stochastic Approximation for Feature Selection and Ranking

Abstract:This manuscript presents the following: (1) an improved version of the Binary Simultaneous Perturbation Stochastic Approximation (SPSA) Method for feature selection in machine learning (Aksakalli and Malekipirbazari, Pattern Recognition Letters, Vol. 75, 2016) based on non-monotone iteration gains computed via the Barzilai and Borwein (BB) method, (2) its adaptation for feature ranking, and (3) comparison against popular methods on public benchmark datasets. The improved method, which we call SPSA-FSR, dramatically reduces the number of iterations required for convergence without impacting solution quality. SPSA-FSR can be used for feature ranking and feature selection both for classification and regression problems. After a review of the current state-of-the-art, we discuss our improvements in detail and present three sets of computational experiments: (1) comparison of SPSA-FS as a (wrapper) feature selection method against sequential methods as well as genetic algorithms, (2) comparison of SPSA-FS as a feature ranking method in a classification setting against random forest importance, chi-squared, and information main methods, and (3) comparison of SPSA-FS as a feature ranking method in a regression setting against minimum redundancy maximum relevance (MRMR), RELIEF, and linear correlation methods. The number of features in the datasets we use range from a few dozens to a few thousands. Our results indicate that SPSA-FS converges to a good feature set in no more than 100 iterations and therefore it is quite fast for a wrapper method. SPSA-FS also outperforms popular feature selection as well as feature ranking methods in majority of test cases, sometimes by a large margin, and it stands as a promising new feature selection and ranking method.

* The methodology introduced in this manuscript, both for feature selection and feature ranking, has been implemented as the "spFSR" R package

Via

Access Paper or Ask Questions

Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation

Mar 05, 2016

Vural Aksakalli, Milad Malekipirbazari

Figure 1 for Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation

Figure 2 for Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation

Figure 3 for Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation

Figure 4 for Feature Selection via Binary Simultaneous Perturbation Stochastic Approximation

Abstract:Feature selection (FS) has become an indispensable task in dealing with today's highly complex pattern recognition problems with massive number of features. In this study, we propose a new wrapper approach for FS based on binary simultaneous perturbation stochastic approximation (BSPSA). This pseudo-gradient descent stochastic algorithm starts with an initial feature vector and moves toward the optimal feature vector via successive iterations. In each iteration, the current feature vector's individual components are perturbed simultaneously by random offsets from a qualified probability distribution. We present computational experiments on datasets with numbers of features ranging from a few dozens to thousands using three widely-used classifiers as wrappers: nearest neighbor, decision tree, and linear support vector machine. We compare our methodology against the full set of features as well as a binary genetic algorithm and sequential FS methods using cross-validated classification error rate and AUC as the performance criteria. Our results indicate that features selected by BSPSA compare favorably to alternative methods in general and BSPSA can yield superior feature sets for datasets with tens of thousands of features by examining an extremely small fraction of the solution space. We are not aware of any other wrapper FS methods that are computationally feasible with good convergence properties for such large datasets.

* This is the Istanbul Sehir University Technical Report #SHR-ISE-2016.01. A short version of this report has been accepted for publication at Pattern Recognition Letters

Via

Access Paper or Ask Questions