Abstract:Interpretability is important for many applications of machine learning to signal data, covering aspects such as how well a model fits the data, how accurately explanations are drawn from it, and how well these can be understood by people. Feature extraction and selection can improve model interpretability by identifying structures in the data that are both informative and intuitively meaningful. To this end, we propose a signal classification framework that combines feature extraction with feature selection using the knockoff filter, a method which provides guarantees on the false discovery rate (FDR) amongst selected features. We apply this to a dataset of Raman spectroscopy measurements from bacterial samples. Using a wavelet-based feature representation of the data and a logistic regression classifier, our framework achieves significantly higher predictive accuracy compared to using the original features as input. Benchmarking was also done with features obtained through principal components analysis, as well as the original features input into a neural network-based classifier. Our proposed framework achieved better predictive performance at the former task and comparable performance at the latter task, while offering the advantage of a more compact and human-interpretable set of features.