Abstract:Although large language models (LLMs) have apparently acquired a certain level of grammatical knowledge and the ability to make generalizations, they fail to interpret negation, a crucial step in Natural Language Processing. We try to clarify the reasons for the sub-optimal performance of LLMs understanding negation. We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge that can be true or false in which negation is present in about 2/3 of the corpus in different forms. We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability and we have also fine-tuned some of the models to assess whether the understanding of negation can be trained. Our findings show that, while LLMs are proficient at classifying affirmative sentences, they struggle with negative sentences and lack a deep understanding of negation, often relying on superficial cues. Although fine-tuning the models on negative sentences improves their performance, the lack of generalization in handling negation is persistent, highlighting the ongoing challenges of LLMs regarding negation understanding and generalization. The dataset and code are publicly available.
Abstract:We describe a detailed analysis of a sample of large benchmark of commonsense reasoning problems that has been automatically obtained from WordNet, SUMO and their mapping. The objective is to provide a better assessment of the quality of both the benchmark and the involved knowledge resources for advanced commonsense reasoning tasks. By means of this analysis, we are able to detect some knowledge misalignments, mapping errors and lack of knowledge and resources. Our final objective is the extraction of some guidelines towards a better exploitation of this commonsense knowledge framework by the improvement of the included resources.
Abstract:In commonsense knowledge representation, the Open World Assumption is adopted as a general standard strategy for the design, construction and use of ontologies, e.g. in OWL. This strategy limits the inferencing capabilities of any system using these ontologies because non-asserted statements could be assumed to be alternatively true or false in different interpretations. In this paper, we investigate the application of the Closed World Assumption to enable a better exploitation of the structural knowledge encoded in a SUMO-based ontology. To that end, we explore three different Closed World Assumption formulations for subclass and disjoint relations in order to reduce the ambiguity of the knowledge encoded in first-order logic ontologies. We evaluate these formulations on a practical experimentation using a very large commonsense benchmark automatically obtained from the knowledge encoded in WordNet through its mapping to SUMO. The results show that the competency of the ontology improves more than 47 % when reasoning under the Closed World Assumption. As conclusion, applying the Closed World Assumption automatically to first-order logic ontologies reduces their expressed ambiguity and more commonsense questions can be answered.
Abstract:Formal ontologies are axiomatizations in a logic-based formalism. The development of formal ontologies, and their important role in the Semantic Web area, is generating considerable research on the use of automated reasoning techniques and tools that help in ontology engineering. One of the main aims is to refine and to improve axiomatizations for enabling automated reasoning tools to efficiently infer reliable information. Defects in the axiomatization can not only cause wrong inferences, but can also hinder the inference of expected information, either by increasing the computational cost of, or even preventing, the inference. In this paper, we introduce a novel, fully automatic white-box testing framework for first-order logic ontologies. Our methodology is based on the detection of inference-based redundancies in the given axiomatization. The application of the proposed testing method is fully automatic since a) the automated generation of tests is guided only by the syntax of axioms and b) the evaluation of tests is performed by automated theorem provers. Our proposal enables the detection of defects and serves to certify the grade of suitability --for reasoning purposes-- of every axiom. We formally define the set of tests that are generated from any axiom and prove that every test is logically related to redundancies in the axiom from which the test has been generated. We have implemented our method and used this implementation to automatically detect several non-trivial defects that were hidden in various first-order logic ontologies. Throughout the paper we provide illustrative examples of these defects, explain how they were found, and how each proof --given by an automated theorem-prover-- provides useful hints on the nature of each defect. Additionally, by correcting all the detected defects, we have obtained an improved version of one of the tested ontologies: Adimen-SUMO.
Abstract:In this paper, we report on the practical application of a novel approach for validating the knowledge of WordNet using Adimen-SUMO. In particular, this paper focuses on cross-checking the WordNet meronymy relations against the knowledge encoded in Adimen-SUMO. Our validation approach tests a large set of competency questions (CQs), which are derived (semi)-automatically from the knowledge encoded in WordNet, SUMO and their mapping, by applying efficient first-order logic automated theorem provers. Unfortunately, despite of being created manually, these knowledge resources are not free of errors and discrepancies. In consequence, some of the resulting CQs are not plausible according to the knowledge included in Adimen-SUMO. Thus, first we focus on (semi)-automatically improving the alignment between these knowledge resources, and second, we perform a minimal set of corrections in the ontology. Our aim is to minimize the manual effort required for an extensive validation process. We report on the strategies followed, the changes made, the effort needed and its impact when validating the WordNet meronymy relations using improved versions of the mapping and the ontology. Based on the new results, we discuss the implications of the appropriate corrections and the need of future enhancements.
Abstract:Artificial Intelligence aims to provide computer programs with commonsense knowledge to reason about our world. This paper offers a new practical approach towards automated commonsense reasoning with first-order logic (FOL) ontologies. We propose a new black-box testing methodology of FOL SUMO-based ontologies by exploiting WordNet and its mapping into SUMO. Our proposal includes a method for the (semi-)automatic creation of a very large benchmark of competency questions and a procedure for its automated evaluation by using automated theorem provers (ATPs). Applying different quality criteria, our testing proposal enables a successful evaluation of a) the competency of several translations of SUMO into FOL and b) the performance of various automated ATPs. Finally, we also provide a fine-grained and complete analysis of the commonsense reasoning competency of current FOL SUMO-based ontologies.
Abstract:We report on the results of evaluating the competency of a first-order ontology for its use with automated theorem provers (ATPs). The evaluation follows the adaptation of the methodology based on competency questions (CQs) [Gr\"uninger&Fox,1995] to the framework of first-order logic, which is presented in [\'Alvez&Lucio&Rigau,2015], and is applied to Adimen-SUMO [\'Alvez&Lucio&Rigau,2015]. The set of CQs used for this evaluation has been automatically generated from a small set of semantic patterns and the mapping of WordNet to SUMO. Analysing the results, we can conclude that it is feasible to use ATPs for working with Adimen-SUMO v2.4, enabling the resolution of goals by means of performing non-trivial inferences.
Abstract:We introduce a new framework to evaluate and improve first-order (FO) ontologies using automated theorem provers (ATPs) on the basis of competency questions (CQs). Our framework includes both the adaptation of a methodology for evaluating ontologies to the framework of first-order logic and a new set of non-trivial CQs designed to evaluate FO versions of SUMO, which significantly extends the very small set of CQs proposed in the literature. Most of these new CQs have been automatically generated from a small set of patterns and the mapping of WordNet to SUMO. Applying our framework, we demonstrate that Adimen-SUMO v2.2 outperforms TPTP-SUMO. In addition, using the feedback provided by ATPs we have set an improved version of Adimen-SUMO (v2.4). This new version outperforms the previous ones in terms of competency. For instance, "Humans can reason" is automatically inferred from Adimen-SUMO v2.4, while it is neither deducible from TPTP-SUMO nor Adimen-SUMO v2.2.