Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Raphaël de Fondville

StatBot.Swiss: Bilingual Open Data Exploration in Natural Language

Jun 06, 2024

Farhad Nooralahzadeh, Yi Zhang, Ellery Smith, Sabine Maennel, Cyril Matthey-Doret, Raphaël de Fondville, Kurt Stockinger

Figure 1 for StatBot.Swiss: Bilingual Open Data Exploration in Natural Language

Figure 2 for StatBot.Swiss: Bilingual Open Data Exploration in Natural Language

Figure 3 for StatBot.Swiss: Bilingual Open Data Exploration in Natural Language

Figure 4 for StatBot.Swiss: Bilingual Open Data Exploration in Natural Language

Abstract:The potential for improvements brought by Large Language Models (LLMs) in Text-to-SQL systems is mostly assessed on monolingual English datasets. However, LLMs' performance for other languages remains vastly unexplored. In this work, we release the StatBot.Swiss dataset, the first bilingual benchmark for evaluating Text-to-SQL systems based on real-world applications. The StatBot.Swiss dataset contains 455 natural language/SQL-pairs over 35 big databases with varying level of complexity for both English and German. We evaluate the performance of state-of-the-art LLMs such as GPT-3.5-Turbo and mixtral-8x7b-instruct for the Text-to-SQL translation task using an in-context learning approach. Our experimental analysis illustrates that current LLMs struggle to generalize well in generating SQL queries on our novel bilingual dataset.

* This work is accepted at ACL Findings 2024

Via

Access Paper or Ask Questions