Abstract:Provenance in databases has been thoroughly studied for positive and for recursive queries, then for first-order (FO) queries, i.e., having negation but no recursion. Query evaluation can be understood as a two-player game where the opponents argue whether or not a tuple is in the query answer. This game-theoretic approach yields a natural provenance model for FO queries, unifying how and why-not provenance. Here, we study the fine-grain structure of game provenance. A game $G=(V,E)$ consists of positions $V$ and moves $E$ and can be solved by computing the well-founded model of a single, unstratifiable rule: \[ \text{win}(X) \leftarrow \text{move}(X, Y), \neg \, \text{win}(Y). \] In the solved game $G^{\lambda}$, the value of a position $x\,{\in}\,V$ is either won, lost, or drawn. This value is explained by the provenance $\mathscr{P}$(x), i.e., certain (annotated) edges reachable from $x$. We identify seven edge types that give rise to new kinds of provenance, i.e., potential, actual, and primary, and demonstrate that "not all moves are created equal". We describe the new provenance types, show how they can be computed while solving games, and discuss applications, e.g., for abstract argumentation frameworks.
Abstract:Prompt-based fine-tuning has become an essential method for eliciting information encoded in pre-trained language models for a variety of tasks, including text classification. For multi-class classification tasks, prompt-based fine-tuning under low-resource scenarios has resulted in performance levels comparable to those of fully fine-tuning methods. Previous studies have used crafted prompt templates and verbalizers, mapping from the label terms space to the class space, to solve the classification problem as a masked language modeling task. However, cross-domain and fine-grained prompt-based fine-tuning with an automatically enriched verbalizer remains unexplored, mainly due to the difficulty and costs of manually selecting domain label terms for the verbalizer, which requires humans with domain expertise. To address this challenge, we introduce SciPrompt, a framework designed to automatically retrieve scientific topic-related terms for low-resource text classification tasks. To this end, we select semantically correlated and domain-specific label terms within the context of scientific literature for verbalizer augmentation. Furthermore, we propose a new verbalization strategy that uses correlation scores as additional weights to enhance the prediction performance of the language model during model tuning. Our method outperforms state-of-the-art, prompt-based fine-tuning methods on scientific text classification tasks under few and zero-shot settings, especially in classifying fine-grained and emerging scientific topics.
Abstract:Entity resolution has been an essential and well-studied task in data cleaning research for decades. Existing work has discussed the feasibility of utilizing pre-trained language models to perform entity resolution and achieved promising results. However, few works have discussed injecting domain knowledge to improve the performance of pre-trained language models on entity resolution tasks. In this study, we propose Knowledge Augmented Entity Resolution (KAER), a novel framework named for augmenting pre-trained language models with external knowledge for entity resolution. We discuss the results of utilizing different knowledge augmentation and prompting methods to improve entity resolution performance. Our model improves on Ditto, the existing state-of-the-art entity resolution method. In particular, 1) KAER performs more robustly and achieves better results on "dirty data", and 2) with more general knowledge injection, KAER outperforms the existing baseline models on the textual dataset and dataset from the online product domain. 3) KAER achieves competitive results on highly domain-specific datasets, such as citation datasets, requiring the injection of expert knowledge in future work.