Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Derian Boer

Enhancing Symbolic Machine Learning by Subsymbolic Representations

Jun 17, 2025

Stephen Roth, Lennart Baur, Derian Boer, Stefan Kramer

Abstract:The goal of neuro-symbolic AI is to integrate symbolic and subsymbolic AI approaches, to overcome the limitations of either. Prominent systems include Logic Tensor Networks (LTN) or DeepProbLog, which offer neural predicates and end-to-end learning. The versatility of systems like LTNs and DeepProbLog, however, makes them less efficient in simpler settings, for instance, for discriminative machine learning, in particular in domains with many constants. Therefore, we follow a different approach: We propose to enhance symbolic machine learning schemes by giving them access to neural embeddings. In the present paper, we show this for TILDE and embeddings of constants used by TILDE in similarity predicates. The approach can be fine-tuned by further refining the embeddings depending on the symbolic theory. In experiments in three real-world domain, we show that this simple, yet effective, approach outperforms all other baseline methods in terms of the F1 score. The approach could be useful beyond this setting: Enhancing symbolic learners in this way could be extended to similarities between instances (effectively working like kernels within a logical language), for analogical reasoning, or for propositionalization.

Via

Access Paper or Ask Questions

Focus, Merge, Rank: Improved Question Answering Based on Semi-structured Knowledge Bases

May 14, 2025

Derian Boer, Stephen Roth, Stefan Kramer

Abstract:In many real-world settings, machine learning models and interactive systems have access to both structured knowledge, e.g., knowledge graphs or tables, and unstructured content, e.g., natural language documents. However, most rely on either. Semi-Structured Knowledge Bases (SKBs) bridge this gap by linking unstructured content to nodes within structured data, thereby enabling new strategies for knowledge access and use. In this work, we present FocusedRetriever, a modular SKB-based framework for multi-hop question answering. It integrates components (VSS-based entity search, LLM-based generation of Cypher queries and pairwise re-ranking) in a way that enables it to outperform state-of-the-art methods across all three STaRK benchmark test sets, covering diverse domains and multiple performance metrics. The average first-hit rate exceeds that of the second-best method by 25.7%. FocusedRetriever leverages (1) the capacity of Large Language Models (LLMs) to extract relational facts and entity attributes from unstructured text, (2) node set joins to filter answer candidates based on these extracted triplets and constraints, (3) vector similarity search to retrieve and rank relevant unstructured content, and (4) the contextual capabilities of LLMs to finally rank the top-k answers. For generality, we only incorporate base LLMs in FocusedRetriever in our evaluation. However, our analysis of intermediate results highlights several opportunities for further upgrades including finetuning. The source code is publicly available at https://github.com/kramerlab/FocusedRetriever .

Via

Access Paper or Ask Questions

Harnessing the Power of Semi-Structured Knowledge and LLMs with Triplet-Based Prefiltering for Question Answering

Sep 01, 2024

Derian Boer, Fabian Koch, Stefan Kramer

Abstract:Large Language Models (LLMs) frequently lack domain-specific knowledge and even fine-tuned models tend to hallucinate. Hence, more reliable models that can include external knowledge are needed. We present a pipeline, 4StepFocus, and specifically a preprocessing step, that can substantially improve the answers of LLMs. This is achieved by providing guided access to external knowledge making use of the model's ability to capture relational context and conduct rudimentary reasoning by themselves. The method narrows down potentially correct answers by triplets-based searches in a semi-structured knowledge base in a direct, traceable fashion, before switching to latent representations for ranking those candidates based on unstructured data. This distinguishes it from related methods that are purely based on latent representations. 4StepFocus consists of the steps: 1) Triplet generation for extraction of relational data by an LLM, 2) substitution of variables in those triplets to narrow down answer candidates employing a knowledge graph, 3) sorting remaining candidates with a vector similarity search involving associated non-structured data, 4) reranking the best candidates by the LLM with background data provided. Experiments on a medical, a product recommendation, and an academic paper search test set demonstrate that this approach is indeed a powerful augmentation. It not only adds relevant traceable background information from information retrieval, but also improves performance considerably in comparison to state-of-the-art methods. This paper presents a novel, largely unexplored direction and therefore provides a wide range of future work opportunities. Used source code is available at https://github.com/kramerlab/4StepFocus.

* 9 pages, published at IJCLR 2024

Via

Access Paper or Ask Questions

Secure Sum Outperforms Homomorphic Encryption in (Current) Collaborative Deep Learning

Jun 02, 2020

Derian Boer, Stefan Kramer

Figure 1 for Secure Sum Outperforms Homomorphic Encryption in (Current) Collaborative Deep Learning

Figure 2 for Secure Sum Outperforms Homomorphic Encryption in (Current) Collaborative Deep Learning

Figure 3 for Secure Sum Outperforms Homomorphic Encryption in (Current) Collaborative Deep Learning

Figure 4 for Secure Sum Outperforms Homomorphic Encryption in (Current) Collaborative Deep Learning

Abstract:Deep learning (DL) approaches are achieving extraordinary results in a wide range of domains but often require a massive collection of private data. Hence, methods for training neural networks on the joint data of different data owners, that keep each party's input confidential, are called for. We address the setting of horizontally distributed data in deep learning, where the participants' vulnerable intermediate results have to be processed in a privacy-preserving manner. The predominant scheme for this setting is based on homomorphic encryption (HE), and it is widely considered to be without alternative. In contrast to this, we demonstrate that a carefully chosen, less complex and computationally less expensive secure sum protocol in conjunction with default secure channels exhibits superior properties in terms of both collusion-resistance and runtime. Finally, we discuss several open research questions in the context of collaborative DL, which possibly might lead back to HE-based solutions.

* submitted to Journal of Artificial Intelligence

Via

Access Paper or Ask Questions