Abstract:Different attribution-scores have been proposed to quantify the relevance of database tuples for a query answer from a database. Among them, we find Causal Responsibility, the Shapley Value, the Banzhaf Power-Index, and the Causal Effect. They have been analyzed in isolation, mainly in terms of computational properties. In this work, we start an investigation into the alignment of these scores on the basis of the queries at hand; that is, on whether they induce compatible rankings of tuples. We are able to identify vast classes of queries for which some pairs of scores are always aligned, and others for which they are not. It turns out that the presence of exogenous tuples makes a crucial difference in this regard.
Abstract:The Causal Effect (CE) is a numerical measure of causal influence of variables on observed results. Despite being widely used in many areas, only preliminary attempts have been made to use CE as an attribution score in data management, to measure the causal strength of tuples for query answering in databases. In this work, we introduce, generalize and investigate the so-called Causal-Effect Score in the context of classical and probabilistic databases.
Abstract:Attribution scores reflect how important the feature values in an input entity are for the output of a machine learning model. One of the most popular attribution scores is the SHAP score, which is an instantiation of the general Shapley value used in coalition game theory. The definition of this score relies on a probability distribution on the entity population. Since the exact distribution is generally unknown, it needs to be assigned subjectively or be estimated from data, which may lead to misleading feature scores. In this paper, we propose a principled framework for reasoning on SHAP scores under unknown entity population distributions. In our framework, we consider an uncertainty region that contains the potential distributions, and the SHAP score of a feature becomes a function defined over this region. We study the basic problems of finding maxima and minima of this function, which allows us to determine tight ranges for the SHAP scores of all features. In particular, we pinpoint the complexity of these problems, and other related ones, showing them to be NP-complete. Finally, we present experiments on a real-world dataset, showing that our framework may contribute to a more robust feature scoring.
Abstract:We describe recent research on the use of actual causality in the definition of responsibility scores as explanations for query answers in databases, and for outcomes from classification models in machine learning. In the case of databases, useful connections with database repairs are illustrated and exploited. Repairs are also used to give a quantitative measure of the consistency of a database. For classification models, the responsibility score is properly extended and illustrated. The efficient computation of Shap-score is also analyzed and discussed. The emphasis is placed on work done by the author and collaborators.
Abstract:We describe some recent approaches to score-based explanations for query answers in databases. The focus is on work done by the author and collaborators. Special emphasis is placed on the use of counterfactual reasoning for score specification and computation. Several examples that illustrate the flexibility of these methods are shown.
Abstract:We address the problem of efficiently computing Shap explanation scores for classifications with machine learning models. With this goal, we show the transformation of binary neural networks (BNNs) for classification into deterministic and decomposable Boolean circuits, for which knowledge compilation techniques are used. The resulting circuit is treated as an open-box model, to compute Shap scores by means of a recent efficient algorithm for this class of circuits. Detailed experiments show a considerable gain in performance in comparison with computing Shap directly on the BNN treated as a black-box model.
Abstract:In this expository article we highlight the relevance of explanations for artificial intelligence, in general, and for the newer developments in {\em explainable AI}, referring to origins and connections of and among different approaches. We describe in simple terms, explanations in data management and machine learning that are based on attribution-scores, and counterfactuals as found in the area of causality. We elaborate on the importance of logical reasoning when dealing with counterfactuals, and their use for score computation.
Abstract:We briefly describe -- mainly through very simple examples -- different kinds of answer-set programs with annotations that have been proposed for specifying: database repairs and consistent query answering; secrecy view and query evaluation with them; counterfactual interventions for causality in databases; and counterfactual-based explanations in machine learning.
Abstract:There are some recent approaches and results about the use of answer-set programming for specifying counterfactual interventions on entities under classification, and reasoning about them. These approaches are flexible and modular in that they allow the seamless addition of domain knowledge. Reasoning is enabled by query answering from the answer-set program. The programs can be used to specify and compute responsibility-based numerical scores as attributive explanations for classification results.
Abstract:Consistent answers to a query from a possibly inconsistent database are answers that are simultaneously retrieved from every possible repair of the database. Repairs are consistent instances that minimally differ from the original inconsistent instance. It has been shown before that database repairs can be specified as the stable models of a disjunctive logic program. In this paper we show how to use the repair programs to transform the problem of consistent query answering into a problem of reasoning w.r.t. a theory written in second-order predicate logic. It also investigated how a first-order theory can be obtained instead by applying second-order quantifier elimination techniques.