Abstract:Question answering (QA) requires accurately aligning user questions with structured queries, a process often limited by the scarcity of high-quality query-natural language (Q-NL) pairs. To overcome this, we present Q-NL Verifier, an approach to generating high-quality synthetic pairs of queries and NL translations. Our approach relies on large language models (LLMs) to generate semantically precise natural language paraphrases of structured queries. Building on these synthetic Q-NL pairs, we introduce a learned verifier component that automatically determines whether a generated paraphrase is semantically equivalent to the original query. Our experiments with the well-known LC-QuAD 2.0 benchmark show that Q-NL Verifier generalizes well to paraphrases from other models and even human-authored translations. Our approach strongly aligns with human judgments across varying query complexities and outperforms existing NLP metrics in assessing semantic correctness. We also integrate the verifier into QA pipelines, showing that verifier-filtered synthetic data has significantly higher quality in terms of translation correctness and enhances NL to Q translation accuracy. Lastly, we release an updated version of the LC-QuAD 2.0 benchmark containing our synthetic Q-NL pairs and verifier scores, offering a new resource for robust and scalable QA.
Abstract:This chapter delves into the emerging field of neuro-symbolic query optimization for knowledge graphs (KGs), presenting a comprehensive exploration of how neural and symbolic techniques can be integrated to enhance query processing. Traditional query optimizers in knowledge graphs rely heavily on symbolic methods, utilizing dataset summaries, statistics, and cost models to select efficient execution plans. However, these approaches often suffer from misestimations and inaccuracies, particularly when dealing with complex queries or large-scale datasets. Recent advancements have introduced neural models, which capture non-linear aspects of query optimization, offering promising alternatives to purely symbolic methods. In this chapter, we introduce neuro-symbolic query optimizers, a novel approach that combines the strengths of symbolic reasoning with the adaptability of neural computation. We discuss the architecture of these hybrid systems, highlighting the interplay between neural and symbolic components to improve the optimizer's ability to navigate the search space and produce efficient execution plans. Additionally, the chapter reviews existing neural components tailored for optimizing queries over knowledge graphs and examines the limitations and challenges in deploying neuro-symbolic query optimizers in real-world environments.
Abstract:Cardinality Estimation over Knowledge Graphs (KG) is crucial for query optimization, yet remains a challenging task due to the semi-structured nature and complex correlations of typical Knowledge Graphs. In this work, we propose GNCE, a novel approach that leverages knowledge graph embeddings and Graph Neural Networks (GNN) to accurately predict the cardinality of conjunctive queries. GNCE first creates semantically meaningful embeddings for all entities in the KG, which are then integrated into the given query, which is processed by a GNN to estimate the cardinality of the query. We evaluate GNCE on several KGs in terms of q-Error and demonstrate that it outperforms state-of-the-art approaches based on sampling, summaries, and (machine) learning in terms of estimation accuracy while also having lower execution time and less parameters. Additionally, we show that GNCE can inductively generalise to unseen entities, making it suitable for use in dynamic query processing scenarios. Our proposed approach has the potential to significantly improve query optimization and related applications that rely on accurate cardinality estimates of conjunctive queries.