Feature selection is generally used as one of the most important pre-processing techniques in machine learning, as it helps to reduce the dimensionality of data and assists researchers and practitioners in understanding data. Thereby, better performance and reduced computational consumption, memory complexity and even data amount can be expected by utilizing feature selection. However, only few studies leverage the power of deep neural networks to solve the problem of feature selection. In this paper, we propose a feature mask module (FM-module) for feature selection based on a novel batch-wise attenuation and feature mask normalization. The proposed method is almost free from hyperparameters and can be easily integrated into common neural networks as an embedded feature selection method. Experiments on popular image, text and speech datasets have been shown that our approach is easy to use and has superior performance in comparison with other state-of-the-art deep learning based feature selection methods.