Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:ESM+: Modern Insights into Perspective on Text-to-SQL Evaluation in the Age of Large Language Models

Jul 10, 2024

Benjamin Ascoli, Ram Kandikonda, Jinho D. Choi

Figure 1 for ESM+: Modern Insights into Perspective on Text-to-SQL Evaluation in the Age of Large Language Models

Figure 2 for ESM+: Modern Insights into Perspective on Text-to-SQL Evaluation in the Age of Large Language Models

Figure 3 for ESM+: Modern Insights into Perspective on Text-to-SQL Evaluation in the Age of Large Language Models

Figure 4 for ESM+: Modern Insights into Perspective on Text-to-SQL Evaluation in the Age of Large Language Models

Share this with someone who'll enjoy it:

Abstract:The task of Text-to-SQL enables anyone to retrieve information from SQL databases using natural language. Despite several challenges, recent models have made remarkable advancements in this task using large language models (LLMs). Interestingly, we find that LLM-based models without fine-tuning exhibit distinct natures compared to their fine-tuned counterparts, leading to inadequacies in current evaluation metrics to accurately convey their performance. Thus, we analyze the two primary metrics, Test Suite Execution Accuracy (EXE) and Exact Set Matching Accuracy (ESM), to examine their robustness for this task and address shortcomings. We compare the performance of 9 LLM-based models using EXE, the original ESM, and our improved ESM (called ESM+). Our results show that EXE and ESM have high false positive and negative rates of 11.3% and 13.9%, while ESM+ gives those of 0.1% and 2.6% respectively, providing a significantly more stable evaluation. We release the ESM+ script as open-source for the community to contribute, while enjoying a more reliable assessment of Text-to-SQL.

View paper on

Share this with someone who'll enjoy it:

Title:ESM+: Modern Insights into Perspective on Text-to-SQL Evaluation in the Age of Large Language Models

Paper and Code