Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel Jaroslawicz

The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models

Aug 14, 2024

Karime Maamari, Fadhil Abubaker, Daniel Jaroslawicz, Amine Mhedhbi

Figure 1 for The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models

Figure 2 for The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models

Figure 3 for The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models

Figure 4 for The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models

Abstract:Schema linking is a crucial step in Text-to-SQL pipelines, which translate natural language queries into SQL. The goal of schema linking is to retrieve relevant tables and columns (signal) while disregarding irrelevant ones (noise). However, imperfect schema linking can often exclude essential columns needed for accurate query generation. In this work, we revisit the need for schema linking when using the latest generation of large language models (LLMs). We find empirically that newer models are adept at identifying relevant schema elements during generation, without the need for explicit schema linking. This allows Text-to-SQL pipelines to bypass schema linking entirely and instead pass the full database schema to the LLM, eliminating the risk of excluding necessary information. Furthermore, as alternatives to schema linking, we propose techniques that improve Text-to-SQL accuracy without compromising on essential schema information. Our approach achieves 71.83\% execution accuracy on the BIRD benchmark, ranking first at the time of submission.

Via

Access Paper or Ask Questions

Retrieve to Explain: Evidence-driven Predictions with Language Models

Feb 06, 2024

Ravi Patel, Angus Brayne, Rogier Hintzen, Daniel Jaroslawicz, Georgiana Neculae, Dane Corneil

Abstract:Machine learning models, particularly language models, are notoriously difficult to introspect. Black-box models can mask both issues in model training and harmful biases. For human-in-the-loop processes, opaque predictions can drive lack of trust, limiting a model's impact even when it performs effectively. To address these issues, we introduce Retrieve to Explain (R2E). R2E is a retrieval-based language model that prioritizes amongst a pre-defined set of possible answers to a research question based on the evidence in a document corpus, using Shapley values to identify the relative importance of pieces of evidence to the final prediction. R2E can adapt to new evidence without retraining, and incorporate structured data through templating into natural language. We assess on the use case of drug target identification from published scientific literature, where we show that the model outperforms an industry-standard genetics-based approach on predicting clinical trial outcomes.

Via

Access Paper or Ask Questions