Abstract:In this chapter, we address the problem of rule mining, beginning with essential background information, including measures of rule quality. We then explore various rule mining methodologies, categorized into three groups: inductive logic programming, path sampling and generalization, and linear programming. Following this, we delve into neurosymbolic methods, covering topics such as the integration of deep learning with rules, the use of embeddings for rule learning, and the application of large language models in rule learning.
Abstract:The aim of this primer is to introduce the subject of knowledge engineering in a concise but synthetic way to develop the reader's intuition about the area.
Abstract:Ontology authoring is a complex process, where commonly the automated reasoner is invoked for verification of newly introduced changes, therewith amounting to a time-consuming test-last approach. Test-Driven Development (TDD) for ontology authoring is a recent {\em test-first} approach that aims to reduce authoring time and increase authoring efficiency. Current TDD testing falls short on coverage of OWL features and possible test outcomes, the rigorous foundation thereof, and evaluations to ascertain its effectiveness. We aim to address these issues in one instantiation of TDD for ontology authoring. We first propose a succinct, logic-based model of TDD testing and present novel TDD algorithms so as to cover also any OWL 2 class expression for the TBox and for the principal ABox assertions, and prove their correctness. The algorithms use methods from the OWL API directly such that reclassification is not necessary for test execution, therewith reducing ontology authoring time. The algorithms were implemented in TDDonto2, a Prot\'eg\'e plugin. TDDonto2 was evaluated on editing efficiency and by users. The editing efficiency study demonstrated that it is faster than a typical ontology authoring interface, especially for medium size and large ontologies. The user evaluation demonstrated that modellers make significantly less errors with TDDonto2 compared to the standard Prot\'eg\'e interface and complete their tasks better using less time. Thus, the results indicate that Test-Driven Development is a promising approach in an ontology development methodology.
Abstract:Competency Questions (CQs) are natural language questions outlining and constraining the scope of knowledge represented by an ontology. Despite that CQs are a part of several ontology engineering methodologies, we have observed that the actual publication of CQs for the available ontologies is very limited and even scarcer is the publication of their respective formalisations in terms of, e.g., SPARQL queries. This paper aims to contribute to addressing the engineering shortcomings of using CQs in ontology development, to facilitate wider use of CQs. In order to understand the relation between CQs and the queries over the ontology to test the CQs on an ontology, we gather, analyse, and publicly release a set of 234 CQs and their translations to SPARQL-OWL for several ontologies in different domains developed by different groups. We analysed the CQs in two principal ways. The first stage focused on a linguistic analysis of the natural language text itself, i.e., a lexico-syntactic analysis without any presuppositions of ontology elements, and a subsequent step of semantic analysis in order to find patterns. This increased diversity of CQ sources resulted in a 5-fold increase of hitherto published patterns, to 106 distinct CQ patterns, which have a limited subset of few patterns shared across the CQ sets from the different ontologies. Next, we analysed the relation between the found CQ patterns and the 46 SPARQL-OWL query signatures, which revealed that one CQ pattern may be realised by more than one SPARQL-OWL query signature, and vice versa. We hope that our work will contribute to establishing common practices, templates, automation, and user tools that will support CQ formulation, formalisation, execution, and general management.
Abstract:Emerging ontology authoring methods to add knowledge to an ontology focus on ameliorating the validation bottleneck. The verification of the newly added axiom is still one of trying and seeing what the reasoner says, because a systematic testbed for ontology authoring is missing. We sought to address this by introducing the approach of test-driven development for ontology authoring. We specify 36 generic tests, as TBox queries and TBox axioms tested through individuals, and structure their inner workings in an `open box'-way, which cover the OWL 2 DL language features. This is implemented as a Protege plugin so that one can perform a TDD test as a black box test. We evaluated the two test approaches on their performance. The TBox queries were faster, and that effect is more pronounced the larger the ontology is. We provide a general sequence of a TDD process for ontology engineering as a foundation for a TDD methodology.
Abstract:We propose a new method for mining frequent patterns in a language that combines both Semantic Web ontologies and rules. In particular we consider the setting of using a language that combines description logics with DL-safe rules. This setting is important for the practical application of data mining to the Semantic Web. We focus on the relation of the semantics of the representation formalism to the task of frequent pattern discovery, and for the core of our method, we propose an algorithm that exploits the semantics of the combined knowledge base. We have developed a proof-of-concept data mining implementation of this. Using this we have empirically shown that using the combined knowledge base to perform semantic tests can make data mining faster by pruning useless candidate patterns before their evaluation. We have also shown that the quality of the set of patterns produced may be improved: the patterns are more compact, and there are fewer patterns. We conclude that exploiting the semantics of a chosen representation formalism is key to the design and application of (onto-)relational frequent pattern discovery methods. Note: To appear in Theory and Practice of Logic Programming (TPLP)