Abstract:Class imbalance is one of the challenging problems for machine learning in many real-world applications, such as coal and gas burst accident monitoring: the burst premonition data is extreme smaller than the normal data, however, which is the highlight we truly focus on. Cost-sensitive adjustment approach is a typical algorithm-level method resisting the data set imbalance. For SVMs classifier, which is modified to incorporate varying penalty parameter(C) for each of considered groups of examples. However, the C value is determined empirically, or is calculated according to the evaluation metric, which need to be computed iteratively and time consuming. This paper presents a novel cost-sensitive SVM method whose penalty parameter C optimized on the basis of cluster probability density function(PDF) and the cluster PDF is estimated only according to similarity matrix and some predefined hyper-parameters. Experimental results on various standard benchmark data sets and real-world data with different ratios of imbalance show that the proposed method is effective in comparison with commonly used cost-sensitive techniques.
Abstract:ELM (Extreme Learning Machine) is a single hidden layer feed-forward network, where the weights between input and hidden layer are initialized randomly. ELM is efficient due to its utilization of the analytical approach to compute weights between hidden and output layer. However, ELM still fails to output the semantic classification outcome. To address such limitation, in this paper, we propose a diversified top-k shapelets transform framework, where the shapelets are the subsequences i.e., the best representative and interpretative features of each class. As we identified, the most challenge problems are how to extract the best k shapelets in original candidate sets and how to automatically determine the k value. Specifically, we first define the similar shapelets and diversified top-k shapelets to construct diversity shapelets graph. Then, a novel diversity graph based top-k shapelets extraction algorithm named as \textbf{DivTopkshapelets}\ is proposed to search top-k diversified shapelets. Finally, we propose a shapelets transformed ELM algorithm named as \textbf{DivShapELM} to automatically determine the k value, which is further utilized for time series classification. The experimental results over public data sets demonstrate that the proposed approach significantly outperforms traditional ELM algorithm in terms of effectiveness and efficiency.