Abstract:In this paper, we investigate the effect of addressing difficult samples from a given text dataset on the downstream text classification task. We define difficult samples as being non-obvious cases for text classification by analysing them in the semantic embedding space; specifically - (i) semantically similar samples that belong to different classes and (ii) semantically dissimilar samples that belong to the same class. We propose a penalty function to measure the overall difficulty score of every sample in the dataset. We conduct exhaustive experiments on 13 standard datasets to show a consistent improvement of up to 9% and discuss qualitative results to show effectiveness of our approach in identifying difficult samples for a text classification model.
Abstract:In Brain Computer Interface (BCI), data generated from Electroencephalogram (EEG) is non-stationary with low signal to noise ratio and contaminated with artifacts. Common Spatial Pattern (CSP) algorithm has been proved to be effective in BCI for extracting features in motor imagery tasks, but it is prone to overfitting. Many algorithms have been devised to regularize CSP for two class problem, however they have not been effective when applied to multiclass CSP. Outliers present in data affect extracted CSP features and reduces performance of the system. In addition to this non-stationarity present in the features extracted from the CSP present a challenge in classification. We propose a method to identify and remove artifact present in the data during pre-processing stage, this helps in calculating eigenvectors which in turn generates better CSP features. To handle the non-stationarity, Self-Regulated Interval Type-2 Neuro-Fuzzy Inference System (SRIT2NFIS) was proposed in the literature for two class EEG classification problem. This paper extends the SRIT2NFIS to multiclass using Joint Approximate Diagonalization (JAD). The results on standard data set from BCI competition IV shows significant increase in the accuracies from the current state of the art methods for multiclass classification.