Ludwig-Maximilians-Universität München
Abstract:We study the generalization behavior of Markov Logic Networks (MLNs) across relational structures of different sizes. Multiple works have noticed that MLNs learned on a given domain generalize poorly across domains of different sizes. This behavior emerges from a lack of internal consistency within an MLN when used across different domain sizes. In this paper, we quantify this inconsistency and bound it in terms of the variance of the MLN parameters. The parameter variance also bounds the KL divergence between an MLN's marginal distributions taken from different domain sizes. We use these bounds to show that maximizing the data log-likelihood while simultaneously minimizing the parameter variance corresponds to two natural notions of generalization across domain sizes. Our theoretical results apply to Exponential Random Graphs and other Markov network based relational models. Finally, we observe that solutions known to decrease the variance of the MLN parameters, like regularization and Domain-Size Aware MLNs, increase the internal consistency of the MLNs. We empirically verify our results on four different datasets, with different methods to control parameter variance, showing that controlling parameter variance leads to better generalization.
Abstract:Pearl and Verma developed d-separation as a widely used graphical criterion to reason about the conditional independencies that are implied by the causal structure of a Bayesian network. As acyclic ground probabilistic logic programs correspond to Bayesian networks on their dependency graph, we can compute conditional independencies from d-separation in the latter. In the present paper, we generalize the reasoning above to the non-ground case. First, we abstract the notion of a probabilistic logic program away from external databases and probabilities to obtain so-called program structures. We then present a correct meta-interpreter that decides whether a certain conditional independence statement is implied by a program structure on a given external database. Finally, we give a fragment of program structures for which we obtain a completeness statement of our conditional independence oracle. We close with an experimental evaluation of our approach revealing that our meta-interpreter performs significantly faster than checking the definition of independence using exact inference in ProbLog 2.
Abstract:Probabilistic logic programs are logic programs where some facts hold with a specified probability. Here, we investigate these programs with a causal framework that allows counterfactual queries. Learning the program structure from observational data is usually done through heuristic search relying on statistical tests. However, these statistical tests lack information about the causal mechanism generating the data, which makes it unfeasible to use the resulting programs for counterfactual reasoning. To address this, we propose a language fragment that allows reconstructing a program from its induced distribution. This further enables us to learn programs supporting counterfactual queries.
Abstract:A ProbLog program is a logic program with facts that only hold with a specified probability. In this contribution we extend this ProbLog language by the ability to answer "What if" queries. Intuitively, a ProbLog program defines a distribution by solving a system of equations in terms of mutually independent predefined Boolean random variables. In the theory of causality, Judea Pearl proposes a counterfactual reasoning for such systems of equations. Based on Pearl's calculus, we provide a procedure for processing these counterfactual queries on ProbLog programs, together with a proof of correctness and a full implementation. Using the latter, we provide insights into the influence of different parameters on the scalability of inference. Finally, we also show that our approach is consistent with CP-logic, i.e. with the causal semantics for logic programs with annotated with disjunctions.
Abstract:We generalise the distribution semantics underpinning probabilistic logic programming by distilling its essential concept, the separation of a free random component and a deterministic part. This abstracts the core ideas beyond logic programming as such to encompass frameworks from probabilistic databases, probabilistic finite model theory and discrete lifted Bayesian networks. To demonstrate the usefulness of such a general approach, we completely characterise the projective families of distributions representable in the generalised distribution semantics and we demonstrate both that large classes of interesting projective families cannot be represented in a generalised distribution semantics and that already a very limited fragment of logic programming (acyclic determinate logic programs) in the determinsitic part suffices to represent all those projective families that are representable in the generalised distribution semantics at all.
Abstract:The behaviour of statistical relational representations across differently sized domains has become a focal area of research from both a modelling and a complexity viewpoint. In 2018, Jaeger and Schulte suggested projectivity of a family of distributions as a key property, ensuring that marginal inference is independent of the domain size. However, Jaeger and Schulte assume that the domain is characterised only by its size. This contribution extends the notion of projectivity from families of distributions indexed by domain size to functors taking extensional data from a database. This makes projectivity available for the large range of applications taking structured input. We transfer the known attractive properties of projective families of distributions to the new setting. Furthermore, we prove a correspondence between projectivity and distributions on countably infinite domains, which we use to unify and generalise earlier work on statistical relational representations in infinite domains. Finally, we use the extended notion of projectivity to define a further strengthening, which we call $\sigma$-projectivity, and which allows the use of the same representation in different modes while retaining projectivity.
Abstract:Dependencies on the relative frequency of a state in the domain are common when modelling probabilistic dependencies on relational data. For instance, the likelihood of a school closure during an epidemic might depend on the proportion of infected pupils exceeding a threshold. Often, rather than depending on discrete thresholds, dependencies are continuous: for instance, the likelihood of any one mosquito bite transmitting an illness depends on the proportion of carrier mosquitoes. Current approaches usually only consider probabilities over possible worlds rather than over domain elements themselves. We introduce two formalisms that explicitly incorporate relative frequencies into statistical relational artificial intelligence. The first formalism, Lifted Bayesian Networks for Conditional Probability Logic, expresses discrete dependencies on probabilistic data. The second formalism, Functional Lifted Bayesian Networks, expresses continuous dependencies. Incorporating relative frequencies is not only beneficial to modelling; it also provides a more rigorous approach to learning problems where training and test or application domains have different sizes. To this end, we provide a representation of the asymptotic probability distributions induced by the two formalisms on domains of increasing sizes. Since that representation has well-understood scaling behaviour across domain sizes, it can be used to estimate parameters for a large domain consistently from randomly sampled subpopulations.
Abstract:We consider Markov logic networks and relational logistic regression as two fundamental representation formalisms in statistical relational artificial intelligence that use weighted formulas in their specification. However, Markov logic networks are based on undirected graphs, while relational logistic regression is based on directed acyclic graphs. We show that when scaling the weight parameters with the domain size, the asymptotic behaviour of a relational logistic regression model is transparently controlled by the parameters, and we supply an algorithm to compute asymptotic probabilities. We also show using two examples that this is not true for Markov logic networks. We also discuss using several examples, mainly from the literature, how the application context can help the user to decide when such scaling is appropriate and when using the raw unscaled parameters might be preferable. We highlight random sampling as a particularly promising area of application for scaled models and expound possible avenues for further research.
Abstract:Over the last years, there has been increasing research on the scaling behaviour of statistical relational representations with the size of the domain, and on the connections between domain size dependence and lifted inference. In particular, the asymptotic behaviour of statistical relational representations has come under scrutiny, and projectivity was isolated as the strongest form of domain size independence. In this contribution we show that every probabilistic logic program under the distribution semantics is asymptotically equivalent to a probabilistic logic program consisting only of range-restricted clauses over probabilistic facts. To facilitate the application of classical results from finite model theory, we introduce the abstract distribution semantics, defined as an arbitrary logical theory over probabilistic facts to bridge the gap to the distribution semantics underlying probabilistic logic programming. In this representation, range-restricted logic programs correspond to quantifier-free theories, making asymptotic quantifier results avilable for use. We can conclude that every probabilistic logic program inducing a projective family of distributions is in fact captured by this class, and we can infer interesting consequences for the expressivity of probabilistic logic programs as well as for the asymptotic behaviour of probabilistic rules.