Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Robert Gaizauskas

Department of Computer Science, University of Sheffield, UK

Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models' Reasoning with Formal Logic

Oct 21, 2024

Jason Chan, Robert Gaizauskas, Zhixue Zhao

Figure 1 for Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models' Reasoning with Formal Logic

Figure 2 for Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models' Reasoning with Formal Logic

Figure 3 for Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models' Reasoning with Formal Logic

Figure 4 for Rulebreakers Challenge: Revealing a Blind Spot in Large Language Models' Reasoning with Formal Logic

Abstract:Formal logic has long been applied to natural language reasoning, but this approach can sometimes lead to conclusions that, while logically entailed, are factually inconsistent with the premises or are not typically inferred by humans. This study introduces the concept of "rulebreakers", which refers to instances where logical entailment diverges from factually acceptable inference. We present RULEBREAKERS, a novel dataset for evaluating Large Language Models' (LLMs) ability to distinguish between rulebreakers and non-rulebreakers. Focusing on modus tollens and disjunctive syllogism, we assess six state-of-the-art LLMs using RULEBREAKERS, measuring their performance in terms of token-level exact accuracy and model confidence. Our findings reveal that while most models perform poorly to moderately in recognizing rulebreakers, they demonstrate a latent ability to distinguish rulebreakers when assessed by their confidence levels. Further analysis suggests that the failure to recognize rulebreakers is potentially associated with the models' world knowledge and their attention distribution patterns. This research highlights the limitation of LLMs' reasoning capabilities, and contributes to the ongoing discussion on reasoning in LLMs.

* Preprint

Via

Access Paper or Ask Questions

Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection

Mar 13, 2018

Yuxing Tang, Josiah Wang, Xiaofang Wang, Boyang Gao, Emmanuel Dellandrea, Robert Gaizauskas, Liming Chen

Figure 1 for Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection

Figure 2 for Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection

Figure 3 for Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection

Figure 4 for Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection

Abstract:Deep CNN-based object detection systems have achieved remarkable success on several large-scale object detection benchmarks. However, training such detectors requires a large number of labeled bounding boxes, which are more difficult to obtain than image-level annotations. Previous work addresses this issue by transforming image-level classifiers into object detectors. This is done by modeling the differences between the two on categories with both image-level and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations. We improve this previous work by incorporating knowledge about object similarities from visual and semantic domains during the transfer process. The intuition behind our proposed method is that visually and semantically similar categories should exhibit more common transferable properties than dissimilar categories, e.g. a better detector would result by transforming the differences between a dog classifier and a dog detector onto the cat class, than would by transforming from the violin class. Experimental results on the challenging ILSVRC2013 detection dataset demonstrate that each of our proposed object similarity based knowledge transfer methods outperforms the baseline methods. We found strong evidence that visual similarity and semantic relatedness are complementary for the task, and when combined notably improve detection, achieving state-of-the-art detection performance in a semi-supervised setting.

* Published in IEEE Transactions on Pattern Analysis and Machine Intelligence, November 2017
* TPAMI. correct some typos

Via

Access Paper or Ask Questions

A Data Driven Approach to Query Expansion in Question Answering

Mar 22, 2012

Leon Derczynski, Jun Wang, Robert Gaizauskas, Mark A. Greenwood

Figure 1 for A Data Driven Approach to Query Expansion in Question Answering

Figure 2 for A Data Driven Approach to Query Expansion in Question Answering

Figure 3 for A Data Driven Approach to Query Expansion in Question Answering

Figure 4 for A Data Driven Approach to Query Expansion in Question Answering

Abstract:Automated answering of natural language questions is an interesting and useful problem to solve. Question answering (QA) systems often perform information retrieval at an initial stage. Information retrieval (IR) performance, provided by engines such as Lucene, places a bound on overall system performance. For example, no answer bearing documents are retrieved at low ranks for almost 40% of questions. In this paper, answer texts from previous QA evaluations held as part of the Text REtrieval Conferences (TREC) are paired with queries and analysed in an attempt to identify performance-enhancing words. These words are then used to evaluate the performance of a query expansion method. Data driven extension words were found to help in over 70% of difficult questions. These words can be used to improve and evaluate query expansion methods. Simple blind relevance feedback (RF) was correctly predicted as unlikely to help overall performance, and an possible explanation is provided for its low value in IR for QA.

* Proc. IR4QA Workshop (2008) 34-41

Via

Access Paper or Ask Questions

USFD at KBP 2011: Entity Linking, Slot Filling and Temporal Bounding

Mar 22, 2012

Amev Burman, Arun Jayapal, Sathish Kannan, Madhu Kavilikatta, Ayman Alhelbawy, Leon Derczynski, Robert Gaizauskas

Figure 1 for USFD at KBP 2011: Entity Linking, Slot Filling and Temporal Bounding

Figure 2 for USFD at KBP 2011: Entity Linking, Slot Filling and Temporal Bounding

Figure 3 for USFD at KBP 2011: Entity Linking, Slot Filling and Temporal Bounding

Figure 4 for USFD at KBP 2011: Entity Linking, Slot Filling and Temporal Bounding

Abstract:This paper describes the University of Sheffield's entry in the 2011 TAC KBP entity linking and slot filling tasks. We chose to participate in the monolingual entity linking task, the monolingual slot filling task and the temporal slot filling tasks. We set out to build a framework for experimentation with knowledge base population. This framework was created, and applied to multiple KBP tasks. We demonstrated that our proposed framework is effective and suitable for collaborative development efforts, as well as useful in a teaching environment. Finally we present results that, while very modest, provide improvements an order of magnitude greater than our 2010 attempt.

* Proc. Text Analysis Conference (2011)

Via

Access Paper or Ask Questions

A Corpus-based Study of Temporal Signals

Mar 22, 2012

Leon Derczynski, Robert Gaizauskas

Figure 1 for A Corpus-based Study of Temporal Signals

Figure 2 for A Corpus-based Study of Temporal Signals

Figure 3 for A Corpus-based Study of Temporal Signals

Figure 4 for A Corpus-based Study of Temporal Signals

Abstract:Automatic temporal ordering of events described in discourse has been of great interest in recent years. Event orderings are conveyed in text via va rious linguistic mechanisms including the use of expressions such as "before", "after" or "during" that explicitly assert a temporal relation -- temporal signals. In this paper, we investigate the role of temporal signals in temporal relation extraction and provide a quantitative analysis of these expres sions in the TimeBank annotated corpus.

* Proceedings of the 6th Conference on Corpus Linguistics (2011), No. 197, pp. 1--8
* Proc. Corpus Linguistics (2011)

Via

Access Paper or Ask Questions

An Annotation Scheme for Reichenbach's Verbal Tense Structure

Mar 22, 2012

Leon Derczynski, Robert Gaizauskas

Figure 1 for An Annotation Scheme for Reichenbach's Verbal Tense Structure

Figure 2 for An Annotation Scheme for Reichenbach's Verbal Tense Structure

Figure 3 for An Annotation Scheme for Reichenbach's Verbal Tense Structure

Abstract:In this paper we present RTMML, a markup language for the tenses of verbs and temporal relations between verbs. There is a richness to tense in language that is not fully captured by existing temporal annotation schemata. Following Reichenbach we present an analysis of tense in terms of abstract time points, with the aim of supporting automated processing of tense and temporal relations in language. This allows for precise reasoning about tense in documents, and the deduction of temporal relations between the times and verbal events in a discourse. We define the syntax of RTMML, and demonstrate the markup in a range of situations.

* Proc. 6th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (2011) 10-17

Via

Access Paper or Ask Questions

USFD2: Annotating Temporal Expresions and TLINKs for TempEval-2

Mar 22, 2012

Leon Derczynski, Robert Gaizauskas

Figure 1 for USFD2: Annotating Temporal Expresions and TLINKs for TempEval-2

Figure 2 for USFD2: Annotating Temporal Expresions and TLINKs for TempEval-2

Abstract:We describe the University of Sheffield system used in the TempEval-2 challenge, USFD2. The challenge requires the automatic identification of temporal entities and relations in text. USFD2 identifies and anchors temporal expressions, and also attempts two of the four temporal relation assignment tasks. A rule-based system picks out and anchors temporal expressions, and a maximum entropy classifier assigns temporal link labels, based on features that include descriptions of associated temporal signal words. USFD2 identified temporal expressions successfully, and correctly classified their type in 90% of cases. Determining the relation between an event and time expression in the same sentence was performed at 63% accuracy, the second highest score in this part of the challenge.

* Proc. 5th International Workshop on Semantic Evaluation (2010) 337-340
* Part of TempEval-2

Via

Access Paper or Ask Questions

Using Signals to Improve Automatic Classification of Temporal Relations

Mar 22, 2012

Leon Derczynski, Robert Gaizauskas

Figure 1 for Using Signals to Improve Automatic Classification of Temporal Relations

Figure 2 for Using Signals to Improve Automatic Classification of Temporal Relations

Figure 3 for Using Signals to Improve Automatic Classification of Temporal Relations

Figure 4 for Using Signals to Improve Automatic Classification of Temporal Relations

Abstract:Temporal information conveyed by language describes how the world around us changes through time. Events, durations and times are all temporal elements that can be viewed as intervals. These intervals are sometimes temporally related in text. Automatically determining the nature of such relations is a complex and unsolved problem. Some words can act as "signals" which suggest a temporal ordering between intervals. In this paper, we use these signal words to improve the accuracy of a recent approach to classification of temporal links.

Via

Access Paper or Ask Questions

Analysing Temporally Annotated Corpora with CAVaT

Mar 22, 2012

Leon Derczynski, Robert Gaizauskas

Figure 1 for Analysing Temporally Annotated Corpora with CAVaT

Figure 2 for Analysing Temporally Annotated Corpora with CAVaT

Figure 3 for Analysing Temporally Annotated Corpora with CAVaT

Figure 4 for Analysing Temporally Annotated Corpora with CAVaT

Abstract:We present CAVaT, a tool that performs Corpus Analysis and Validation for TimeML. CAVaT is an open source, modular checking utility for statistical analysis of features specific to temporally-annotated natural language corpora. It provides reporting, highlights salient links between a variety of general and time-specific linguistic features, and also validates a temporal annotation to ensure that it is logically consistent and sufficiently annotated. Uniquely, CAVaT provides analysis specific to TimeML-annotated temporal information. TimeML is a standard for annotating temporal information in natural language text. In this paper, we present the reporting part of CAVaT, and then its error-checking ability, including the workings of several novel TimeML document verification methods. This is followed by the execution of some example tasks using the tool to show relations between times, events, signals and links. We also demonstrate inconsistencies in a TimeML corpus (TimeBank) that have been detected with CAVaT.

* Proc. LREC (2010) 398-404

Via

Access Paper or Ask Questions

Compacting the Penn Treebank Grammar

Jan 31, 1999

Alexander Krotov, Mark Hepple, Robert Gaizauskas, Yorick Wilks

Figure 1 for Compacting the Penn Treebank Grammar

Figure 2 for Compacting the Penn Treebank Grammar

Figure 3 for Compacting the Penn Treebank Grammar

Abstract:Treebanks, such as the Penn Treebank (PTB), offer a simple approach to obtaining a broad coverage grammar: one can simply read the grammar off the parse trees in the treebank. While such a grammar is easy to obtain, a square-root rate of growth of the rule set with corpus size suggests that the derived grammar is far from complete and that much more treebanked text would be required to obtain a complete grammar, if one exists at some limit. However, we offer an alternative explanation in terms of the underspecification of structures within the treebank. This hypothesis is explored by applying an algorithm to compact the derived grammar by eliminating redundant rules -- rules whose right hand sides can be parsed by other rules. The size of the resulting compacted grammar, which is significantly less than that of the full treebank grammar, is shown to approach a limit. However, such a compacted grammar does not yield very good performance figures. A version of the compaction algorithm taking rule probabilities into account is proposed, which is argued to be more linguistically motivated. Combined with simple thresholding, this method can be used to give a 58% reduction in grammar size without significant change in parsing performance, and can produce a 69% reduction with some gain in recall, but a loss in precision.

* In Proceedings of COLING-98 (Montreal), pages 699-703
* 5 pages, 2 figures

Via

Access Paper or Ask Questions