Abstract:Feature selection is one of the most decisive tools in understanding data and machine learning models. Among other methods, sparsity induced by $L^{1}$ penalty is one of the simplest and best studied approaches to this problem. Although such regularization is frequently used in neural networks to achieve sparsity of weights or unit activations, it is unclear how it can be employed in the feature selection problem. This work aims at extending the neural network with ability to automatically select features by rethinking how the sparsity regularization can be used, namely, by stochastically penalizing feature involvement instead of the layer weights. The proposed method has demonstrated superior efficiency when compared to a few classical methods, achieved with minimal or no computational overhead, and can be directly applied to any existing architecture. Furthermore, the method is easily generalizable for neuron pruning and selection of regions of importance for spectral data.
Abstract:Binary Stochastic Filtering (BSF), the algorithm for feature selection and neuron pruning is proposed in this work. Filtering layer stochastically passes or filters out features based on individual weights, which are tuned during neural network training process. By placing BSF after the neural network input, the filtering of input features is performed, i.e. feature selection. More then 5-fold dimensionality decrease was achieved in the experiments. Placing BSF layer in between hidden layers allows filtering of neuron outputs and could be used for neuron pruning. Up to 34-fold decrease in the number of weights in the network was reached, which corresponds to the significant increase of performance, that is especially important for mobile and embedded applications.