Abstract:As the number of patients with heart failure increases, machine learning (ML) has garnered attention in cardiomyopathy diagnosis, driven by the shortage of pathologists. However, endomyocardial biopsy specimens are often small sample size and require techniques such as feature extraction and dimensionality reduction. This study aims to determine whether texture features are effective for feature extraction in the pathological diagnosis of cardiomyopathy. Furthermore, model designs that contribute toward improving generalization performance are examined by applying feature selection (FS) and dimensional compression (DC) to several ML models. The obtained results were verified by visualizing the inter-class distribution differences and conducting statistical hypothesis testing based on texture features. Additionally, they were evaluated using predictive performance across different model designs with varying combinations of FS and DC (applied or not) and decision boundaries. The obtained results confirmed that texture features may be effective for the pathological diagnosis of cardiomyopathy. Moreover, when the ratio of features to the sample size is high, a multi-step process involving FS and DC improved the generalization performance, with the linear kernel support vector machine achieving the best results. This process was demonstrated to be potentially effective for models with reduced complexity, regardless of whether the decision boundaries were linear, curved, perpendicular, or parallel to the axes. These findings are expected to facilitate the development of an effective cardiomyopathy diagnostic model for its rapid adoption in medical practice.
Abstract:Decision trees offer the benefit of easy interpretation because they allow the classification of input data based on if--then rules. However, as decision trees are constructed by an algorithm that achieves clear classification with minimum necessary rules, the trees possess the drawback of extracting only minimum rules, even when various latent rules exist in data. Approaches that construct multiple trees using randomly selected feature subsets do exist. However, the number of trees that can be constructed remains at the same scale because the number of feature subsets is a combinatorial explosion. Additionally, when multiple trees are constructed, numerous rules are generated, of which several are untrustworthy and/or highly similar. Therefore, we propose "MAABO-MT" and "GS-MRM" algorithms that strategically construct trees with high estimation performance among all possible trees with small computational complexity and extract only reliable and non-similar rules, respectively. Experiments are conducted using several open datasets to analyze the effectiveness of the proposed method. The results confirm that MAABO-MT can discover reliable rules at a lower computational cost than other methods that rely on randomness. Furthermore, the proposed method is confirmed to provide deeper insights than single decision trees commonly used in previous studies. Therefore, MAABO-MT and GS-MRM can efficiently extract rules from combinatorially exploded decision trees.
Abstract:Recently, many convolutional neural networks (CNNs) for classification by time domain data of multisignals have been developed. Although some signals are important for correct classification, others are not. When data that do not include important signals for classification are taken as the CNN input layer, the calculation, memory, and data collection costs increase. Therefore, identifying and eliminating nonimportant signals from the input layer are important. In this study, we proposed features gradient-based signals selection algorithm (FG-SSA), which can be used for finding and removing nonimportant signals for classification by utilizing features gradient obtained by the calculation process of grad-CAM. When we define N as the number of signals, the computational complexity of the proposed algorithm is linear time O(N), that is, it has a low calculation cost. We verified the effectiveness of the algorithm using the OPPORTUNITY Activity Recognition dataset, which is an open dataset comprising acceleration signals of human activities. In addition, we checked the average 6.55 signals from a total of 15 acceleration signals (five triaxial sensors) that were removed by FG-SSA while maintaining high generalization scores of classification. Therefore, the proposed algorithm FG-SSA has an effect on finding and removing signals that are not important for CNN-based classification.