Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stergios Chatzikyriakidis

Reasoning with RAGged events: RAG-Enhanced Event Knowledge Base Construction and reasoning with proof-assistants

Jun 08, 2025

Stergios Chatzikyriakidis

Abstract:Extracting structured computational representations of historical events from narrative text remains computationally expensive when constructed manually. While RDF/OWL reasoners enable graph-based reasoning, they are limited to fragments of first-order logic, preventing deeper temporal and semantic analysis. This paper addresses both challenges by developing automatic historical event extraction models using multiple LLMs (GPT-4, Claude, Llama 3.2) with three enhancement strategies: pure base generation, knowledge graph enhancement, and Retrieval-Augmented Generation (RAG). We conducted comprehensive evaluations using historical texts from Thucydides. Our findings reveal that enhancement strategies optimize different performance dimensions rather than providing universal improvements. For coverage and historical breadth, base generation achieves optimal performance with Claude and GPT-4 extracting comprehensive events. However, for precision, RAG enhancement improves coordinate accuracy and metadata completeness. Model architecture fundamentally determines enhancement sensitivity: larger models demonstrate robust baseline performance with incremental RAG improvements, while Llama 3.2 shows extreme variance from competitive performance to complete failure. We then developed an automated translation pipeline converting extracted RDF representations into Coq proof assistant specifications, enabling higher-order reasoning beyond RDF capabilities including multi-step causal verification, temporal arithmetic with BC dates, and formal proofs about historical causation. The Coq formalization validates that RAG-discovered event types represent legitimate domain-specific semantic structures rather than ontological violations.

Via

Access Paper or Ask Questions

On Tables with Numbers, with Numbers

Aug 14, 2024

Konstantinos Kogkalidis, Stergios Chatzikyriakidis

Figure 1 for On Tables with Numbers, with Numbers

Figure 2 for On Tables with Numbers, with Numbers

Figure 3 for On Tables with Numbers, with Numbers

Figure 4 for On Tables with Numbers, with Numbers

Abstract:This paper is a critical reflection on the epistemic culture of contemporary computational linguistics, framed in the context of its growing obsession with tables with numbers. We argue against tables with numbers on the basis of their epistemic irrelevance, their environmental impact, their role in enabling and exacerbating social inequalities, and their deep ties to commercial applications and profit-driven research. We substantiate our arguments with empirical evidence drawn from a meta-analysis of computational linguistics research over the last decade.

* v2: corrected Figure 2 scale and caption (thanks go to Ernest Davis)

Via

Access Paper or Ask Questions

OYXOY: A Modern NLP Test Suite for Modern Greek

Sep 13, 2023

Konstantinos Kogkalidis, Stergios Chatzikyriakidis, Eirini Chrysovalantou Giannikouri, Vassiliki Katsouli, Christina Klironomou, Christina Koula, Dimitris Papadakis, Thelka Pasparaki, Erofili Psaltaki, Efthymia Sakellariou(+1 more)

Abstract:This paper serves as a foundational step towards the development of a linguistically motivated and technically relevant evaluation suite for Greek NLP. We initiate this endeavor by introducing four expert-verified evaluation tasks, specifically targeted at natural language inference, word sense disambiguation (through example comparison or sense selection) and metaphor detection. More than language-adapted replicas of existing tasks, we contribute two innovations which will resonate with the broader resource and evaluation community. Firstly, our inference dataset is the first of its kind, marking not just \textit{one}, but rather \textit{all} possible inference labels, accounting for possible shifts due to e.g. ambiguity or polysemy. Secondly, we demonstrate a cost-efficient method to obtain datasets for under-resourced languages. Using ChatGPT as a language-neutral parser, we transform the Dictionary of Standard Modern Greek into a structured format, from which we derive the other three tasks through simple projections. Alongside each task, we conduct experiments using currently available state of the art machinery. Our experimental baselines affirm the challenging nature of our tasks and highlight the need for expedited progress in order for the Greek NLP ecosystem to keep pace with contemporary mainstream research.

Via

Access Paper or Ask Questions

GRDD: A Dataset for Greek Dialectal NLP

Aug 01, 2023

Stergios Chatzikyriakidis, Chatrine Qwaider, Ilias Kolokousis, Christina Koula, Dimitris Papadakis, Efthymia Sakellariou

Abstract:In this paper, we present a dataset for the computational study of a number of Modern Greek dialects. It consists of raw text data from four dialects of Modern Greek, Cretan, Pontic, Northern Greek and Cypriot Greek. The dataset is of considerable size, albeit imbalanced, and presents the first attempt to create large scale dialectal resources of this type for Modern Greek dialects. We then use the dataset to perform dialect idefntification. We experiment with traditional ML algorithms, as well as simple DL architectures. The results show very good performance on the task, potentially revealing that the dialects in question have distinct enough characteristics allowing even simple ML models to perform well on the task. Error analysis is performed for the top performing algorithms showing that in a number of cases the errors are due to insufficient dataset cleaning.

Via

Access Paper or Ask Questions

How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets

Jan 12, 2022

Aarne Talman, Marianna Apidianaki, Stergios Chatzikyriakidis, Jörg Tiedemann

Figure 1 for How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets

Figure 2 for How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets

Figure 3 for How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets

Figure 4 for How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets

Abstract:A central question in natural language understanding (NLU) research is whether high performance demonstrates the models' strong reasoning capabilities. We present an extensive series of controlled experiments where pre-trained language models are exposed to data that have undergone specific corruption transformations. The transformations involve removing instances of specific word classes and often lead to non-sensical sentences. Our results show that performance remains high for most GLUE tasks when the models are fine-tuned or tested on corrupted data, suggesting that the models leverage other cues for prediction even in non-sensical contexts. Our proposed data transformations can be used as a diagnostic tool for assessing the extent to which a specific dataset constitutes a proper testbed for evaluating models' language understanding capabilities.

Via

Access Paper or Ask Questions

NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

Apr 10, 2021

Aarne Talman, Marianna Apidianaki, Stergios Chatzikyriakidis, Jörg Tiedemann

Figure 1 for NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

Figure 2 for NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

Figure 3 for NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

Figure 4 for NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

Abstract:Pre-trained neural language models give high performance on natural language inference (NLI) tasks. But whether they actually understand the meaning of the processed sequences remains unclear. We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities. We specifically apply controlled corruption transformations to widely used benchmarks (MNLI and ANLI), which involve removing entire word classes and often lead to non-sensical sentence pairs. If model accuracy on the corrupted data remains high, then the dataset is likely to contain statistical biases and artefacts that guide prediction. Inversely, a large decrease in model accuracy indicates that the original dataset provides a proper challenge to the models' reasoning capabilities. Hence, our proposed controls can serve as a crash test for developing high quality data for NLI tasks.

* NoDaLiDa 2021 camera ready

Via

Access Paper or Ask Questions

FraCaS: Temporal Analysis

Dec 19, 2020

Jean-Philippe Bernardy, Stergios Chatzikyriakidis

Abstract:In this paper, we propose an implementation of temporal semantics which is suitable for inference problems. This implementation translates syntax trees to logical formulas, suitable for consumption by the Coq proof assistant. We support several phenomena including: temporal references, temporal adverbs, aspectual classes and progressives. We apply these semantics to the complete FraCaS testsuite. We obtain an accuracy of 81 percent overall and 73 percent for problems explicitly marked as related to temporal reference.

Via

Access Paper or Ask Questions

A corpus of precise natural textual entailment problems

Dec 14, 2018

Jean-Philippe Bernardy, Stergios Chatzikyriakidis

Figure 1 for A corpus of precise natural textual entailment problems

Abstract:In this paper, we present a new corpus of entailment problems. This corpus combines the following characteristics: 1. it is precise (does not leave out implicit hypotheses) 2. it is based on "real-world" texts (i.e. most of the premises were written for purposes other than testing textual entailment). 3. its size is 150. The corpus was constructed by taking problems from the Real Text Entailment and discovering missing hypotheses using a crowd of experts. We believe that this corpus constitutes a first step towards wide-coverage testing of precise natural-language inference systems.

* 34 pages including appendices

Via

Access Paper or Ask Questions

Testing the Generalization Power of Neural Network Models Across NLI Benchmarks

Oct 30, 2018

Aarne Talman, Stergios Chatzikyriakidis

Figure 1 for Testing the Generalization Power of Neural Network Models Across NLI Benchmarks

Figure 2 for Testing the Generalization Power of Neural Network Models Across NLI Benchmarks

Figure 3 for Testing the Generalization Power of Neural Network Models Across NLI Benchmarks

Figure 4 for Testing the Generalization Power of Neural Network Models Across NLI Benchmarks

Abstract:Neural network models have been very successful for natural language inference, with the best models reaching 90% accuracy in some benchmarks. However, the success of these models turns out to be largely benchmark specific. We show that models trained on natural language inference dataset drawn from one benchmark fail to perform well in others, even if the notion of inference assumed in these benchmark tasks is the same or similar. We train five state-of-the-art neural network models on different datasets and show that each one of these fail to generalize outside of the respective benchmark. In light of these results we conclude that the current neural network models are not able to generalize in capturing the semantics of natural language inference, but seem to be overfitting to the specific dataset.

Via

Access Paper or Ask Questions