Abstract:There are many methodologies and techniques for easing the task of ontology building. Here we describe the intersection of two of these: ontology normalisation and fully programmatic ontology development. The first of these describes a standardized organisation for an ontology, with singly inherited self-standing entities, and a number of small taxonomies of refining entities. The former are described and defined in terms of the latter and used to manage the polyhierarchy of the self-standing entities. Fully programmatic development is a technique where an ontology is developed using a domain-specific language within a programming language, meaning that as well defining ontological entities, it is possible to add arbitrary patterns or new syntax within the same environment. We describe how new patterns can be used to enable a new style of ontology development that we call hypernormalisation.
Abstract:The process of building ontologies is a difficult task that involves collaboration between ontology developers and domain experts and requires an ongoing interaction between them. This collaboration is made more difficult, because they tend to use different tool sets, which can hamper this interaction. In this paper, we propose to decrease this distance between domain experts and ontology developers by creating more readable forms of ontologies, and further to enable editing in normal office environments. Building on a programmatic ontology development environment, such as Tawny-OWL, we are now able to generate these readable/editable from the raw ontological source and its embedded comments. We have this translation to HTML for reading; this environment provides rich hyperlinking as well as active features such as hiding the source code in favour of comments. We are now working on translation to a Word document that also enables editing. Taken together this should provide a significant new route for collaboration between the ontologist and domain specialist.
Abstract:In this article, we investigate our early attempts at building an ontology describing rehabilitation therapies following brain injury. These therapies are wide-ranging, involving interventions of many different kinds. As a result, these therapies are hard to describe. As well as restricting actual practice, this is also a major impediment to evidence-based medicine as it is hard to meaningfully compare two treatment plans. Ontology development requires significant effort from both ontologists and domain experts. Knowledge elicited from domain experts forms the scope of the ontology. The process of knowledge elicitation is expensive, consumes experts' time and might have biases depending on the selection of the experts. Various methodologies and techniques exist for enabling this knowledge elicitation, including community groups and open development practices. A related problem is that of defining scope. By defining the scope, we can decide whether a concept (i.e. term) should be represented in the ontology. This is the opposite of knowledge elicitation, in the sense that it defines what should not be in the ontology. This can be addressed by pre-defining a set of competency questions. These approaches are, however, expensive and time-consuming. Here, we describe our work toward an alternative approach, bootstrapping the ontology from an initially small corpus of literature that will define the scope of the ontology, expanding this to a set covering the domain, then using information extraction to define an initial terminology to provide the basis and the competencies for the ontology. Here, we discuss four approaches to building a suitable corpus that is both sufficiently covering and precise.
Abstract:Ontology development relates to software development in that they both involve the production of formal computational knowledge. It is possible, therefore, that some of the techniques used in software engineering could also be used for ontologies; for example, in software engineering testing is a well-established process, and part of many different methodologies. The application of testing to ontologies, therefore, seems attractive. The Karyotype Ontology is developed using the novel Tawny-OWL library. This provides a fully programmatic environment for ontology development, which includes a complete test harness. In this paper, we describe how we have used this harness to build an extensive series of tests as well as used a commodity continuous integration system to link testing deeply into our development process; this environment, is applicable to any OWL ontology whether written using Tawny-OWL or not. Moreover, we present a novel analysis of our tests, introducing a new classification of what our different tests are. For each class of test, we describe why we use these tests, also by comparison to software tests. We believe that this systematic comparison between ontology and software development will help us move to a more agile form of ontology development.
Abstract:Background: Understanding the distinction between function and role is vexing and difficult. While it appears to be useful, in practice this distinction is hard to apply, particularly within biology. Results: I take an evolutionary approach, considering a series of examples, to develop and generate definitions for these concepts. I test them in practice against the Ontology for Biomedical Investigations (OBI). Finally, I give an axiomatisation and discuss methods for applying these definitions in practice. Conclusions: The definitions in this paper are applicable, formalizing current practice. As such, they make a significant contribution to the use of these concepts within biomedical ontologies.
Abstract:A constant influx of new data poses a challenge in keeping the annotation in biological databases current. Most biological databases contain significant quantities of textual annotation, which often contains the richest source of knowledge. Many databases reuse existing knowledge, during the curation process annotations are often propagated between entries. However, this is often not made explicit. Therefore, it can be hard, potentially impossible, for a reader to identify where an annotation originated from. Within this work we attempt to identify annotation provenance and track its subsequent propagation. Specifically, we exploit annotation reuse within the UniProt Knowledgebase (UniProtKB), at the level of individual sentences. We describe a visualisation approach for the provenance and propagation of sentences in UniProtKB which enables a large-scale statistical analysis. Initially levels of sentence reuse within UniProtKB were analysed, showing that reuse is heavily prevalent, which enables the tracking of provenance and propagation. By analysing sentences throughout UniProtKB, a number of interesting propagation patterns were identified, covering over 100, 000 sentences. Over 8000 sentences remain in the database after they have been removed from the entries where they originally occurred. Analysing a subset of these sentences suggest that approximately 30% are erroneous, whilst 35% appear to be inconsistent. These results suggest that being able to visualise sentence propagation and provenance can aid in the determination of the accuracy and quality of textual annotation. Source code and supplementary data are available from the authors website.
Abstract:The Tawny-OWL library provides a fully-programmatic environment for ontology building; it enables the use of a rich set of tools for ontology development, by recasting development as a form of programming. It is built in Clojure - a modern Lisp dialect, and is backed by the OWL API. Used simply, it has a similar syntax to OWL Manchester syntax, but it provides arbitrary extensibility and abstraction. It builds on existing facilities for Clojure, which provides a rich and modern programming tool chain, for versioning, distributed development, build, testing and continuous integration. In this paper, we describe the library, this environment and the its potential implications for the ontology development process.