Abstract:Explaining machine learning (ML) models using eXplainable AI (XAI) techniques has become essential to make them more transparent and trustworthy. This is especially important in high-stakes domains like healthcare, where understanding model decisions is critical to ensure ethical, sound, and trustworthy outcome predictions. However, users are often confused about which explanability method to choose for their specific use case. We present a comparative analysis of widely used explainability methods, Shapley Additive Explanations (SHAP) and Gradient-weighted Class Activation Mapping (GradCAM), within the domain of human activity recognition (HAR) utilizing graph convolutional networks (GCNs). By evaluating these methods on skeleton-based data from two real-world datasets, including a healthcare-critical cerebral palsy (CP) case, this study provides vital insights into both approaches' strengths, limitations, and differences, offering a roadmap for selecting the most appropriate explanation method based on specific models and applications. We quantitatively and quantitatively compare these methods, focusing on feature importance ranking, interpretability, and model sensitivity through perturbation experiments. While SHAP provides detailed input feature attribution, GradCAM delivers faster, spatially oriented explanations, making both methods complementary depending on the application's requirements. Given the importance of XAI in enhancing trust and transparency in ML models, particularly in sensitive environments like healthcare, our research demonstrates how SHAP and GradCAM could complement each other to provide more interpretable and actionable model explanations.
Abstract:In Human Activity Recognition (HAR), understanding the intricacy of body movements within high-risk applications is essential. This study uses SHapley Additive exPlanations (SHAP) to explain the decision-making process of Graph Convolution Networks (GCNs) when classifying activities with skeleton data. We employ SHAP to explain two real-world datasets: one for cerebral palsy (CP) classification and the widely used NTU RGB+D 60 action recognition dataset. To test the explanation, we introduce a novel perturbation approach that modifies the model's edge importance matrix, allowing us to evaluate the impact of specific body key points on prediction outcomes. To assess the fidelity of our explanations, we employ informed perturbation, targeting body key points identified as important by SHAP and comparing them against random perturbation as a control condition. This perturbation enables a judgment on whether the body key points are truly influential or non-influential based on the SHAP values. Results on both datasets show that body key points identified as important through SHAP have the largest influence on the accuracy, specificity, and sensitivity metrics. Our findings highlight that SHAP can provide granular insights into the input feature contribution to the prediction outcome of GCNs in HAR tasks. This demonstrates the potential for more interpretable and trustworthy models in high-stakes applications like healthcare or rehabilitation.
Abstract:The advancement of deep learning in human activity recognition (HAR) using 3D skeleton data is critical for applications in healthcare, security, sports, and human-computer interaction. This paper tackles a well-known gap in the field, which is the lack of testing in the applicability and reliability of XAI evaluation metrics in the skeleton-based HAR domain. We have tested established XAI metrics namely faithfulness and stability on Class Activation Mapping (CAM) and Gradient-weighted Class Activation Mapping (Grad-CAM) to address this problem. The study also introduces a perturbation method that respects human biomechanical constraints to ensure realistic variations in human movement. Our findings indicate that \textit{faithfulness} may not be a reliable metric in certain contexts, such as with the EfficientGCN model. Conversely, stability emerges as a more dependable metric when there is slight input data perturbations. CAM and Grad-CAM are also found to produce almost identical explanations, leading to very similar XAI metric performance. This calls for the need for more diversified metrics and new XAI methods applied in skeleton-based HAR.
Abstract:This paper introduces AutoGCN, a generic Neural Architecture Search (NAS) algorithm for Human Activity Recognition (HAR) using Graph Convolution Networks (GCNs). HAR has gained attention due to advances in deep learning, increased data availability, and enhanced computational capabilities. At the same time, GCNs have shown promising results in modeling relationships between body key points in a skeletal graph. While domain experts often craft dataset-specific GCN-based methods, their applicability beyond this specific context is severely limited. AutoGCN seeks to address this limitation by simultaneously searching for the ideal hyperparameters and architecture combination within a versatile search space using a reinforcement controller while balancing optimal exploration and exploitation behavior with a knowledge reservoir during the search process. We conduct extensive experiments on two large-scale datasets focused on skeleton-based action recognition to assess the proposed algorithm's performance. Our experimental results underscore the effectiveness of AutoGCN in constructing optimal GCN architectures for HAR, outperforming conventional NAS and GCN methods, as well as random search. These findings highlight the significance of a diverse search space and an expressive input representation to enhance the network performance and generalizability.