Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Javier Álvez

This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models

Oct 24, 2023

Iker García-Ferrero, Begoña Altuna, Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Abstract:Although large language models (LLMs) have apparently acquired a certain level of grammatical knowledge and the ability to make generalizations, they fail to interpret negation, a crucial step in Natural Language Processing. We try to clarify the reasons for the sub-optimal performance of LLMs understanding negation. We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge that can be true or false in which negation is present in about 2/3 of the corpus in different forms. We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability and we have also fine-tuned some of the models to assess whether the understanding of negation can be trained. Our findings show that, while LLMs are proficient at classifying affirmative sentences, they struggle with negative sentences and lack a deep understanding of negation, often relying on superficial cues. Although fine-tuning the models on negative sentences improves their performance, the lack of generalization in handling negation is persistent, highlighting the ongoing challenges of LLMs regarding negation understanding and generalization. The dataset and code are publicly available.

* Accepted in the The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)

Via

Access Paper or Ask Questions

Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis

Sep 06, 2019

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Figure 1 for Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis

Figure 2 for Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis

Figure 3 for Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis

Abstract:We describe a detailed analysis of a sample of large benchmark of commonsense reasoning problems that has been automatically obtained from WordNet, SUMO and their mapping. The objective is to provide a better assessment of the quality of both the benchmark and the involved knowledge resources for advanced commonsense reasoning tasks. By means of this analysis, we are able to detect some knowledge misalignments, mapping errors and lack of knowledge and resources. Our final objective is the extraction of some guidelines towards a better exploitation of this commonsense knowledge framework by the improvement of the included resources.

* 9 pages, 2 figures, 2 tables; 10th Global WordNet Conference - GWC 2019

Via

Access Paper or Ask Questions

Applying the Closed World Assumption to SUMO-based Ontologies

Aug 14, 2018

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Figure 1 for Applying the Closed World Assumption to SUMO-based Ontologies

Figure 2 for Applying the Closed World Assumption to SUMO-based Ontologies

Figure 3 for Applying the Closed World Assumption to SUMO-based Ontologies

Figure 4 for Applying the Closed World Assumption to SUMO-based Ontologies

Abstract:In commonsense knowledge representation, the Open World Assumption is adopted as a general standard strategy for the design, construction and use of ontologies, e.g. in OWL. This strategy limits the inferencing capabilities of any system using these ontologies because non-asserted statements could be assumed to be alternatively true or false in different interpretations. In this paper, we investigate the application of the Closed World Assumption to enable a better exploitation of the structural knowledge encoded in a SUMO-based ontology. To that end, we explore three different Closed World Assumption formulations for subclass and disjoint relations in order to reduce the ambiguity of the knowledge encoded in first-order logic ontologies. We evaluate these formulations on a practical experimentation using a very large commonsense benchmark automatically obtained from the knowledge encoded in WordNet through its mapping to SUMO. The results show that the competency of the ontology improves more than 47 % when reasoning under the Closed World Assumption. As conclusion, applying the Closed World Assumption automatically to first-order logic ontologies reduces their expressed ambiguity and more commonsense questions can be answered.

* 16 pages, 1 figure, 4 tables

Via

Access Paper or Ask Questions

Automatic White-Box Testing of First-Order Logic Ontologies

Jun 26, 2018

Javier Álvez, Montserrat Hermo, Paqui Lucio, German Rigau

Figure 1 for Automatic White-Box Testing of First-Order Logic Ontologies

Figure 2 for Automatic White-Box Testing of First-Order Logic Ontologies

Figure 3 for Automatic White-Box Testing of First-Order Logic Ontologies

Figure 4 for Automatic White-Box Testing of First-Order Logic Ontologies

Abstract:Formal ontologies are axiomatizations in a logic-based formalism. The development of formal ontologies, and their important role in the Semantic Web area, is generating considerable research on the use of automated reasoning techniques and tools that help in ontology engineering. One of the main aims is to refine and to improve axiomatizations for enabling automated reasoning tools to efficiently infer reliable information. Defects in the axiomatization can not only cause wrong inferences, but can also hinder the inference of expected information, either by increasing the computational cost of, or even preventing, the inference. In this paper, we introduce a novel, fully automatic white-box testing framework for first-order logic ontologies. Our methodology is based on the detection of inference-based redundancies in the given axiomatization. The application of the proposed testing method is fully automatic since a) the automated generation of tests is guided only by the syntax of axioms and b) the evaluation of tests is performed by automated theorem provers. Our proposal enables the detection of defects and serves to certify the grade of suitability --for reasoning purposes-- of every axiom. We formally define the set of tests that are generated from any axiom and prove that every test is logically related to redundancies in the axiom from which the test has been generated. We have implemented our method and used this implementation to automatically detect several non-trivial defects that were hidden in various first-order logic ontologies. Throughout the paper we provide illustrative examples of these defects, explain how they were found, and how each proof --given by an automated theorem-prover-- provides useful hints on the nature of each defect. Additionally, by correcting all the detected defects, we have obtained an improved version of one of the tested ontologies: Adimen-SUMO.

* 35 pages, 5 tables

Via

Access Paper or Ask Questions

Validating WordNet Meronymy Relations using Adimen-SUMO

May 20, 2018

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Figure 1 for Validating WordNet Meronymy Relations using Adimen-SUMO

Figure 2 for Validating WordNet Meronymy Relations using Adimen-SUMO

Figure 3 for Validating WordNet Meronymy Relations using Adimen-SUMO

Figure 4 for Validating WordNet Meronymy Relations using Adimen-SUMO

Abstract:In this paper, we report on the practical application of a novel approach for validating the knowledge of WordNet using Adimen-SUMO. In particular, this paper focuses on cross-checking the WordNet meronymy relations against the knowledge encoded in Adimen-SUMO. Our validation approach tests a large set of competency questions (CQs), which are derived (semi)-automatically from the knowledge encoded in WordNet, SUMO and their mapping, by applying efficient first-order logic automated theorem provers. Unfortunately, despite of being created manually, these knowledge resources are not free of errors and discrepancies. In consequence, some of the resulting CQs are not plausible according to the knowledge included in Adimen-SUMO. Thus, first we focus on (semi)-automatically improving the alignment between these knowledge resources, and second, we perform a minimal set of corrections in the ontology. Our aim is to minimize the manual effort required for an extensive validation process. We report on the strategies followed, the changes made, the effort needed and its impact when validating the WordNet meronymy relations using improved versions of the mapping and the ontology. Based on the new results, we discuss the implications of the appropriate corrections and the need of future enhancements.

* 14 pages, 10 tables

Via

Access Paper or Ask Questions

Black-box Testing of First-Order Logic Ontologies Using WordNet

Mar 23, 2018

Javier Álvez, Paqui Lucio, German Rigau

Figure 1 for Black-box Testing of First-Order Logic Ontologies Using WordNet

Figure 2 for Black-box Testing of First-Order Logic Ontologies Using WordNet

Figure 3 for Black-box Testing of First-Order Logic Ontologies Using WordNet

Figure 4 for Black-box Testing of First-Order Logic Ontologies Using WordNet

Abstract:Artificial Intelligence aims to provide computer programs with commonsense knowledge to reason about our world. This paper offers a new practical approach towards automated commonsense reasoning with first-order logic (FOL) ontologies. We propose a new black-box testing methodology of FOL SUMO-based ontologies by exploiting WordNet and its mapping into SUMO. Our proposal includes a method for the (semi-)automatic creation of a very large benchmark of competency questions and a procedure for its automated evaluation by using automated theorem provers (ATPs). Applying different quality criteria, our testing proposal enables a successful evaluation of a) the competency of several translations of SUMO into FOL and b) the performance of various automated ATPs. Finally, we also provide a fine-grained and complete analysis of the commonsense reasoning competency of current FOL SUMO-based ontologies.

* 59 pages,14 figures, 6 tables

Via

Access Paper or Ask Questions

Evaluating the Competency of a First-Order Ontology

Oct 16, 2015

Javier Álvez, Paqui Lucio, German Rigau

Figure 1 for Evaluating the Competency of a First-Order Ontology

Figure 2 for Evaluating the Competency of a First-Order Ontology

Figure 3 for Evaluating the Competency of a First-Order Ontology

Figure 4 for Evaluating the Competency of a First-Order Ontology

Abstract:We report on the results of evaluating the competency of a first-order ontology for its use with automated theorem provers (ATPs). The evaluation follows the adaptation of the methodology based on competency questions (CQs) [Gr\"uninger&Fox,1995] to the framework of first-order logic, which is presented in [\'Alvez&Lucio&Rigau,2015], and is applied to Adimen-SUMO [\'Alvez&Lucio&Rigau,2015]. The set of CQs used for this evaluation has been automatically generated from a small set of semantic patterns and the mapping of WordNet to SUMO. Analysing the results, we can conclude that it is feasible to use ATPs for working with Adimen-SUMO v2.4, enabling the resolution of goals by means of performing non-trivial inferences.

* Proceedings of the 8th International Conference on Knowledge Capture (K-CAP 2015). Palisades, NY. 2015
* 4 pages, 4 figures

Via

Access Paper or Ask Questions

Improving the Competency of First-Order Ontologies

Oct 16, 2015

Javier Álvez, Paqui Lucio, German Rigau

Figure 1 for Improving the Competency of First-Order Ontologies

Figure 2 for Improving the Competency of First-Order Ontologies

Abstract:We introduce a new framework to evaluate and improve first-order (FO) ontologies using automated theorem provers (ATPs) on the basis of competency questions (CQs). Our framework includes both the adaptation of a methodology for evaluating ontologies to the framework of first-order logic and a new set of non-trivial CQs designed to evaluate FO versions of SUMO, which significantly extends the very small set of CQs proposed in the literature. Most of these new CQs have been automatically generated from a small set of patterns and the mapping of WordNet to SUMO. Applying our framework, we demonstrate that Adimen-SUMO v2.2 outperforms TPTP-SUMO. In addition, using the feedback provided by ATPs we have set an improved version of Adimen-SUMO (v2.4). This new version outperforms the previous ones in terms of competency. For instance, "Humans can reason" is automatically inferred from Adimen-SUMO v2.4, while it is neither deducible from TPTP-SUMO nor Adimen-SUMO v2.2.

* Proceedings of the 8th International Conference on Knowledge Capture (K-CAP 2015). Palisades, NY. 2015
* 8 pages, 2 tables

Via

Access Paper or Ask Questions