Abstract:The fractal dimension provides a statistical index of object complexity by studying how the pattern changes with the measuring scale. Although useful in several classification tasks, the fractal dimension is under-explored in deep learning applications. In this work, we investigate the features that are learned by deep models and we study whether these deep networks are able to encode features as complex and high-level as the fractal dimensions. Specifically, we conduct a correlation analysis experiment to show that deep networks are not able to extract such a feature in none of their layers. We combine our analytical study with a human evaluation to investigate the differences between deep learning networks and models that operate on the fractal feature solely. Moreover, we show the effectiveness of fractal features in applications where the object structure is crucial for the classification task. We empirically show that training a shallow network on fractal features achieves performance comparable, even superior in specific cases, to that of deep networks trained on raw data while requiring less computational resources. Fractals improved the accuracy of the classification by 30% on average while requiring up to 84% less time to train. We couple our empirical study with a complexity analysis of the computational cost of extracting the proposed fractal features, and we study its limitation.
Abstract:Building an accurate load forecasting model with minimal underpredictions is vital to prevent any undesired power outages due to underproduction of electricity. However, the power consumption patterns of the residential sector contain fluctuations and anomalies making them challenging to predict. In this paper, we propose multiple Long Short-Term Memory (LSTM) frameworks with different asymmetric loss functions to impose a higher penalty on underpredictions. We also apply a density-based spatial clustering of applications with noise (DBSCAN) anomaly detection approach, prior to the load forecasting task, to remove any present oultiers. Considering the effect of weather and social factors, seasonality splitting is performed on the three considered datasets from France, Germany, and Hungary containing hourly power consumption, weather, and calendar features. Root-mean-square error (RMSE) results show that removing the anomalies efficiently reduces the underestimation and overestimation errors in all the seasonal datasets. Additionally, asymmetric loss functions and seasonality splitting effectively minimize underestimations despite increasing the overestimation error to some degree. Reducing underpredictions of electricity consumption is essential to prevent power outages that can be damaging to the community.
Abstract:Graph Neural Networks have gained huge interest in the past few years. These powerful algorithms expanded deep learning models to non-Euclidean space and were able to achieve state of art performance in various applications including recommender systems and social networks. However, this performance is based on static graph structures assumption which limits the Graph Neural Networks performance when the data varies with time. Temporal Graph Neural Networks are extension of Graph Neural Networks that takes the time factor into account. Recently, various Temporal Graph Neural Network algorithms were proposed and achieved superior performance compared to other deep learning algorithms in several time dependent applications. This survey discusses interesting topics related to Spatio temporal Graph Neural Networks, including algorithms, application, and open challenges.
Abstract:Current interpretability methods focus on explaining a particular model's decision through present input features. Such methods do not inform the user of the sufficient conditions that alter these decisions when they are not desirable. Contrastive explanations circumvent this problem by providing explanations of the form "If the feature $X>x$, the output $Y$ would be different''. While different approaches are developed to find contrasts; these methods do not all deal with mutability and attainability constraints. In this work, we present a novel approach to locally contrast the prediction of any classifier. Our Contrastive Entropy-based explanation method, CEnt, approximates a model locally by a decision tree to compute entropy information of different feature splits. A graph, G, is then built where contrast nodes are found through a one-to-many shortest path search. Contrastive examples are generated from the shortest path to reflect feature splits that alter model decisions while maintaining lower entropy. We perform local sampling on manifold-like distances computed by variational auto-encoders to reflect data density. CEnt is the first non-gradient-based contrastive method generating diverse counterfactuals that do not necessarily exist in the training data while satisfying immutability (ex. race) and semi-immutability (ex. age can only change in an increasing direction). Empirical evaluation on four real-world numerical datasets demonstrates the ability of CEnt in generating counterfactuals that achieve better proximity rates than existing methods without compromising latency, feasibility, and attainability. We further extend CEnt to imagery data to derive visually appealing and useful contrasts between class labels on MNIST and Fashion MNIST datasets. Finally, we show how CEnt can serve as a tool to detect vulnerabilities of textual classifiers.
Abstract:Contrastive explanation methods go beyond transparency and address the contrastive aspect of explanations. Such explanations are emerging as an attractive option to provide actionable change to scenarios adversely impacted by classifiers' decisions. However, their extension to textual data is under-explored and there is little investigation on their vulnerabilities and limitations. This work motivates textual counterfactuals by laying the ground for a novel evaluation scheme inspired by the faithfulness of explanations. Accordingly, we extend the computation of three metrics, proximity,connectedness and stability, to textual data and we benchmark two successful contrastive methods, POLYJUICE and MiCE, on our suggested metrics. Experiments on sentiment analysis data show that the connectedness of counterfactuals to their original counterparts is not obvious in both models. More interestingly, the generated contrastive texts are more attainable with POLYJUICE which highlights the significance of latent representations in counterfactual search. Finally, we perform the first semantic adversarial attack on textual recourse methods. The results demonstrate the robustness of POLYJUICE and the role that latent input representations play in robustness and reliability.
Abstract:Current Explainable AI (ExAI) methods, especially in the NLP field, are conducted on various datasets by employing different metrics to evaluate several aspects. The lack of a common evaluation framework is hindering the progress tracking of such methods and their wider adoption. In this work, inspired by offline information retrieval, we propose different metrics and techniques to evaluate the explainability of SA models from two angles. First, we evaluate the strength of the extracted "rationales" in faithfully explaining the predicted outcome. Second, we measure the agreement between ExAI methods and human judgment on a homegrown dataset1 to reflect on the rationales plausibility. Our conducted experiments comprise four dimensions: (1) the underlying architectures of SA models, (2) the approach followed by the ExAI method, (3) the reasoning difficulty, and (4) the homogeneity of the ground-truth rationales. We empirically demonstrate that anchors explanations are more aligned with the human judgment and can be more confident in extracting supporting rationales. As can be foreseen, the reasoning complexity of sentiment is shown to thwart ExAI methods from extracting supporting evidence. Moreover, a remarkable discrepancy is discerned between the results of different explainability methods on the various architectures suggesting the need for consolidation to observe enhanced performance. Predominantly, transformers are shown to exhibit better explainability than convolutional and recurrent architectures. Our work paves the way towards designing more interpretable NLP models and enabling a common evaluation ground for their relative strengths and robustness.
Abstract:While there has been a recent explosion of work on ExplainableAI ExAI on deep models that operate on imagery and tabular data, textual datasets present new challenges to the ExAI community. Such challenges can be attributed to the lack of input structure in textual data, the use of word embeddings that add to the opacity of the models and the difficulty of the visualization of the inner workings of deep models when they are trained on textual data. Lately, methods have been developed to address the aforementioned challenges and present satisfactory explanations on Natural Language Processing (NLP) models. However, such methods are yet to be studied in a comprehensive framework where common challenges are properly stated and rigorous evaluation practices and metrics are proposed. Motivated to democratize ExAI methods in the NLP field, we present in this work a survey that studies model-agnostic as well as model-specific explainability methods on NLP models. Such methods can either develop inherently interpretable NLP models or operate on pre-trained models in a post-hoc manner. We make this distinction and we further decompose the methods into three categories according to what they explain: (1) word embeddings (input-level), (2) inner workings of NLP models (processing-level) and (3) models' decisions (output-level). We also detail the different evaluation approaches interpretability methods in the NLP field. Finally, we present a case-study on the well-known neural machine translation in an appendix and we propose promising future research directions for ExAI in the NLP field.
Abstract:Advances in deep learning and transfer learning have paved the way for various automation classification tasks in agriculture, including plant diseases, pests, weeds, and plant species detection. However, agriculture automation still faces various challenges, such as the limited size of datasets and the absence of plant-domain-specific pretrained models. Domain specific pretrained models have shown state of art performance in various computer vision tasks including face recognition and medical imaging diagnosis. In this paper, we propose AgriNet dataset, a collection of 160k agricultural images from more than 19 geographical locations, several images captioning devices, and more than 423 classes of plant species and diseases. We also introduce AgriNet models, a set of pretrained models on five ImageNet architectures: VGG16, VGG19, Inception-v3, InceptionResNet-v2, and Xception. AgriNet-VGG19 achieved the highest classification accuracy of 94 % and the highest F1-score of 92%. Additionally, all proposed models were found to accurately classify the 423 classes of plant species, diseases, pests, and weeds with a minimum accuracy of 87% for the Inception-v3 model.Finally, experiments to evaluate of superiority of AgriNet models compared to ImageNet models were conducted on two external datasets: pest and plant diseases dataset from Bangladesh and a plant diseases dataset from Kashmir.
Abstract:In the aftermath of disasters, building damage maps are obtained using change detection to plan rescue operations. Current convolutional neural network approaches do not consider the similarities between neighboring buildings for predicting the damage. We present a novel graph-based building damage detection solution to capture these relationships. Our proposed model architecture learns from both local and neighborhood features to predict building damage. Specifically, we adopt the sample and aggregate graph convolution strategy to learn aggregation functions that generalize to unseen graphs which is essential for alleviating the time needed to obtain predictions for new disasters. Our experiments on the xBD dataset and comparisons with a classical convolutional neural network reveal that while our approach is handicapped by class imbalance, it presents a promising and distinct advantage when it comes to cross-disaster generalization.
Abstract:Change detection is instrumental to localize damage and understand destruction in disaster informatics. While convolutional neural networks are at the core of recent change detection solutions, we present in this work, BLDNet, a novel graph formulation for building damage change detection and enable learning relationships and representations from both local patterns and non-stationary neighborhoods. More specifically, we use graph convolutional networks to efficiently learn these features in a semi-supervised framework with few annotated data. Additionally, BLDNet formulation allows for the injection of additional contextual building meta-features. We train and benchmark on the xBD dataset to validate the effectiveness of our approach. We also demonstrate on urban data from the 2020 Beirut Port Explosion that performance is improved by incorporating domain knowledge building meta-features.