Abstract:We often see the term explainable in the titles of papers that describe applications based on artificial intelligence (AI). However, the literature in explainable artificial intelligence (XAI) indicates that explanations in XAI are application- and domain-specific, hence requiring evaluation whenever they are employed to explain a model that makes decisions for a specific application problem. Additionally, the literature reveals that the performance of post-hoc methods, particularly feature attribution methods, varies substantially hinting that they do not represent a solution to AI explainability. Therefore, when using XAI methods, the quality and suitability of their information outputs should be evaluated within the specific application. For these reasons, we used a scoping review methodology to investigate papers that apply AI models and adopt methods to generate post-hoc explanations while referring to said models as explainable. This paper investigates whether the term explainable model is adopted by authors under the assumption that incorporating a post-hoc XAI method suffices to characterize a model as explainable. To inspect this problem, our review analyzes whether these papers conducted evaluations. We found that 81% of the application papers that refer to their approaches as an explainable model do not conduct any form of evaluation on the XAI method they used.
Abstract:AI methods referred to as interpretable are often discredited as inaccurate by supporters of the existence of a trade-off between interpretability and accuracy. In many problem contexts however this trade-off does not hold. This paper discusses a regression problem context to predict flight take-off delays where the most accurate data regression model was trained via the XGBoost implementation of gradient boosted decision trees. While building an XGB-CBR Twin and converting the XGBoost feature importance into global weights in the CBR model, the resultant CBR model alone provides the most accurate local prediction, maintains the global importance to provide a global explanation of the model, and offers the most interpretable representation for local explanations. This resultant CBR model becomes a benchmark of accuracy and interpretability for this problem context, and hence it is used to evaluate the two additive feature attribute methods SHAP and LIME to explain the XGBoost regression model. The results with respect to local accuracy and feature attribution lead to potentially valuable future work.