Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wuwei Lan

PRACTIQ: A Practical Conversational Text-to-SQL dataset with Ambiguous and Unanswerable Queries

Oct 14, 2024

Mingwen Dong, Nischal Ashok Kumar, Yiqun Hu, Anuj Chauhan, Chung-Wei Hang, Shuaichen Chang, Lin Pan, Wuwei Lan, Henghui Zhu, Jiarong Jiang(+2 more)

Figure 1 for PRACTIQ: A Practical Conversational Text-to-SQL dataset with Ambiguous and Unanswerable Queries

Figure 2 for PRACTIQ: A Practical Conversational Text-to-SQL dataset with Ambiguous and Unanswerable Queries

Figure 3 for PRACTIQ: A Practical Conversational Text-to-SQL dataset with Ambiguous and Unanswerable Queries

Figure 4 for PRACTIQ: A Practical Conversational Text-to-SQL dataset with Ambiguous and Unanswerable Queries

Abstract:Previous text-to-SQL datasets and systems have primarily focused on user questions with clear intentions that can be answered. However, real user questions can often be ambiguous with multiple interpretations or unanswerable due to a lack of relevant data. In this work, we construct a practical conversational text-to-SQL dataset called PRACTIQ, consisting of ambiguous and unanswerable questions inspired by real-world user questions. We first identified four categories of ambiguous questions and four categories of unanswerable questions by studying existing text-to-SQL datasets. Then, we generate conversations with four turns: the initial user question, an assistant response seeking clarification, the user's clarification, and the assistant's clarified SQL response with the natural language explanation of the execution results. For some ambiguous queries, we also directly generate helpful SQL responses, that consider multiple aspects of ambiguity, instead of requesting user clarification. To benchmark the performance on ambiguous, unanswerable, and answerable questions, we implemented large language model (LLM)-based baselines using various LLMs. Our approach involves two steps: question category classification and clarification SQL prediction. Our experiments reveal that state-of-the-art systems struggle to handle ambiguous and unanswerable questions effectively. We will release our code for data generation and experiments on GitHub.

Via

Access Paper or Ask Questions

You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQL

Sep 18, 2024

Hideo Kobayashi, Wuwei Lan, Peng Shi, Shuaichen Chang, Jiang Guo, Henghui Zhu, Zhiguo Wang, Patrick Ng

Figure 1 for You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQL

Figure 2 for You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQL

Figure 3 for You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQL

Figure 4 for You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQL

Abstract:While significant progress has been made on the text-to-SQL task, recent solutions repeatedly encode the same database schema for every question, resulting in unnecessary high inference cost and often overlooking crucial database knowledge. To address these issues, we propose You Only Read Once (YORO), a novel paradigm that directly internalizes database knowledge into the parametric knowledge of a text-to-SQL model during training and eliminates the need for schema encoding during inference. YORO significantly reduces the input token length by 66%-98%. Despite its shorter inputs, our empirical results demonstrate YORO's competitive performances with traditional systems on three benchmarks as well as its significant outperformance on large databases. Furthermore, YORO excels in handling questions with challenging value retrievals such as abbreviation.

Via

Access Paper or Ask Questions

UNITE: A Unified Benchmark for Text-to-SQL Evaluation

May 26, 2023

Wuwei Lan, Zhiguo Wang, Anuj Chauhan, Henghui Zhu, Alexander Li, Jiang Guo, Sheng Zhang, Chung-Wei Hang, Joseph Lilien, Yiqun Hu(+8 more)

Figure 1 for UNITE: A Unified Benchmark for Text-to-SQL Evaluation

Figure 2 for UNITE: A Unified Benchmark for Text-to-SQL Evaluation

Figure 3 for UNITE: A Unified Benchmark for Text-to-SQL Evaluation

Abstract:A practical text-to-SQL system should generalize well on a wide variety of natural language questions, unseen database schemas, and novel SQL query structures. To comprehensively evaluate text-to-SQL systems, we introduce a \textbf{UNI}fied benchmark for \textbf{T}ext-to-SQL \textbf{E}valuation (UNITE). It is composed of publicly available text-to-SQL datasets, containing natural language questions from more than 12 domains, SQL queries from more than 3.9K patterns, and 29K databases. Compared to the widely used Spider benchmark \cite{yu-etal-2018-spider}, we introduce $\sim$120K additional examples and a threefold increase in SQL patterns, such as comparative and boolean questions. We conduct a systematic study of six state-of-the-art (SOTA) text-to-SQL parsers on our new benchmark and show that: 1) Codex performs surprisingly well on out-of-domain datasets; 2) specially designed decoding methods (e.g. constrained beam search) can improve performance for both in-domain and out-of-domain settings; 3) explicitly modeling the relationship between questions and schemas further improves the Seq2Seq models. More importantly, our benchmark presents key challenges towards compositional generalization and robustness issues -- which these SOTA models cannot address well. \footnote{Our code and data processing script will be available at \url{https://github.com/XXXX.}}

* 5 pages

Via

Access Paper or Ask Questions

Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness

Jan 21, 2023

Shuaichen Chang, Jun Wang, Mingwen Dong, Lin Pan, Henghui Zhu, Alexander Hanbo Li, Wuwei Lan, Sheng Zhang, Jiarong Jiang, Joseph Lilien(+6 more)

Figure 1 for Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness

Figure 2 for Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness

Figure 3 for Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness

Figure 4 for Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness

Abstract:Neural text-to-SQL models have achieved remarkable performance in translating natural language questions into SQL queries. However, recent studies reveal that text-to-SQL models are vulnerable to task-specific perturbations. Previous curated robustness test sets usually focus on individual phenomena. In this paper, we propose a comprehensive robustness benchmark based on Spider, a cross-domain text-to-SQL benchmark, to diagnose the model robustness. We design 17 perturbations on databases, natural language questions, and SQL queries to measure the robustness from different angles. In order to collect more diversified natural question perturbations, we utilize large pretrained language models (PLMs) to simulate human behaviors in creating natural questions. We conduct a diagnostic study of the state-of-the-art models on the robustness set. Experimental results reveal that even the most robust model suffers from a 14.0% performance drop overall and a 50.7% performance drop on the most challenging perturbation. We also present a breakdown analysis regarding text-to-SQL model designs and provide insights for improving model robustness.

* ICLR 2023

Via

Access Paper or Ask Questions

Importance of Synthesizing High-quality Data for Text-to-SQL Parsing

Dec 17, 2022

Yiyun Zhao, Jiarong Jiang, Yiqun Hu, Wuwei Lan, Henry Zhu, Anuj Chauhan, Alexander Li, Lin Pan, Jun Wang, Chung-Wei Hang(+7 more)

Figure 1 for Importance of Synthesizing High-quality Data for Text-to-SQL Parsing

Figure 2 for Importance of Synthesizing High-quality Data for Text-to-SQL Parsing

Figure 3 for Importance of Synthesizing High-quality Data for Text-to-SQL Parsing

Figure 4 for Importance of Synthesizing High-quality Data for Text-to-SQL Parsing

Abstract:Recently, there has been increasing interest in synthesizing data to improve downstream text-to-SQL tasks. In this paper, we first examined the existing synthesized datasets and discovered that state-of-the-art text-to-SQL algorithms did not further improve on popular benchmarks when trained with augmented synthetic data. We observed two shortcomings: illogical synthetic SQL queries from independent column sampling and arbitrary table joins. To address these issues, we propose a novel synthesis framework that incorporates key relationships from schema, imposes strong typing, and conducts schema-distance-weighted column sampling. We also adopt an intermediate representation (IR) for the SQL-to-text task to further improve the quality of the generated natural language questions. When existing powerful semantic parsers are pre-finetuned on our high-quality synthesized data, our experiments show that these models have significant accuracy boosts on popular benchmarks, including new state-of-the-art performance on Spider.

Via

Access Paper or Ask Questions

Neural semi-Markov CRF for Monolingual Word Alignment

Jun 16, 2021

Wuwei Lan, Chao Jiang, Wei Xu

Figure 1 for Neural semi-Markov CRF for Monolingual Word Alignment

Figure 2 for Neural semi-Markov CRF for Monolingual Word Alignment

Figure 3 for Neural semi-Markov CRF for Monolingual Word Alignment

Figure 4 for Neural semi-Markov CRF for Monolingual Word Alignment

Abstract:Monolingual word alignment is important for studying fine-grained editing operations (i.e., deletion, addition, and substitution) in text-to-text generation tasks, such as paraphrase generation, text simplification, neutralizing biased language, etc. In this paper, we present a novel neural semi-Markov CRF alignment model, which unifies word and phrase alignments through variable-length spans. We also create a new benchmark with human annotations that cover four different text genres to evaluate monolingual word alignment models in more realistic settings. Experimental results show that our proposed model outperforms all previous approaches for monolingual word alignment as well as a competitive QA-based baseline, which was previously only applied to bilingual data. Our model demonstrates good generalizability to three out-of-domain datasets and shows great utility in two downstream applications: automatic text simplification and sentence pair classification tasks.

* Accepted to ACL 2021

Via

Access Paper or Ask Questions

Neural CRF Model for Sentence Alignment in Text Simplification

May 18, 2020

Chao Jiang, Mounica Maddela, Wuwei Lan, Yang Zhong, Wei Xu

Figure 1 for Neural CRF Model for Sentence Alignment in Text Simplification

Figure 2 for Neural CRF Model for Sentence Alignment in Text Simplification

Figure 3 for Neural CRF Model for Sentence Alignment in Text Simplification

Figure 4 for Neural CRF Model for Sentence Alignment in Text Simplification

Abstract:The success of a text simplification system heavily depends on the quality and quantity of complex-simple sentence pairs in the training corpus, which are extracted by aligning sentences between parallel articles. To evaluate and improve sentence alignment quality, we create two manually annotated sentence-aligned datasets from two commonly used text simplification corpora, Newsela and Wikipedia. We propose a novel neural CRF alignment model which not only leverages the sequential nature of sentences in parallel documents but also utilizes a neural sentence pair model to capture semantic similarity. Experiments demonstrate that our proposed approach outperforms all the previous work on monolingual sentence alignment task by more than 5 points in F1. We apply our CRF aligner to construct two new text simplification datasets, Newsela-Auto and Wiki-Auto, which are much larger and of better quality compared to the existing datasets. A Transformer-based seq2seq model trained on our datasets establishes a new state-of-the-art for text simplification in both automatic and human evaluation.

* The paper has been accepted to ACL 2020

Via

Access Paper or Ask Questions

A Focused Study to Compare Arabic Pre-training Models on Newswire IE Tasks

Apr 30, 2020

Wuwei Lan, Yang Chen, Wei Xu, Alan Ritter

Figure 1 for A Focused Study to Compare Arabic Pre-training Models on Newswire IE Tasks

Figure 2 for A Focused Study to Compare Arabic Pre-training Models on Newswire IE Tasks

Abstract:The Arabic language is a morphological rich language, posing many challenges for information extraction (IE) tasks, including Named Entity Recognition (NER), Part-of-Speech tagging (POS), Argument Role Labeling (ARL) and Relation Extraction (RE). A few multilingual pre-trained models have been proposed and show good performance for Arabic, however, most experiment results are reported on language understanding tasks, such as natural language inference, question answering and sentiment analysis. Their performance on the IE tasks is less known, in particular, the cross-lingual transfer capability from English to Arabic. In this work, we pre-train a Gigaword-based bilingual language model (GigaBERT) to study these two distant languages as well as zero-short transfer learning on the information extraction tasks. Our GigaBERT model can outperform mBERT and XLM-R-base on NER, POS and ARL tasks, with regarding to the per-language and/or zero-transfer performance. We make our pre-trained models publicly available at https://github.com/lanwuwei/GigaBERT to facilitate the research of this field.

Via

Access Paper or Ask Questions

Travel Time Estimation without Road Networks: An Urban Morphological Layout Representation Approach

Jul 08, 2019

Wuwei Lan, Yanyan Xu, Bin Zhao

Figure 1 for Travel Time Estimation without Road Networks: An Urban Morphological Layout Representation Approach

Figure 2 for Travel Time Estimation without Road Networks: An Urban Morphological Layout Representation Approach

Figure 3 for Travel Time Estimation without Road Networks: An Urban Morphological Layout Representation Approach

Figure 4 for Travel Time Estimation without Road Networks: An Urban Morphological Layout Representation Approach

Abstract:Travel time estimation is a crucial task for not only personal travel scheduling but also city planning. Previous methods focus on modeling toward road segments or sub-paths, then summing up for a final prediction, which have been recently replaced by deep neural models with end-to-end training. Usually, these methods are based on explicit feature representations, including spatio-temporal features, traffic states, etc. Here, we argue that the local traffic condition is closely tied up with the land-use and built environment, i.e., metro stations, arterial roads, intersections, commercial area, residential area, and etc, yet the relation is time-varying and too complicated to model explicitly and efficiently. Thus, this paper proposes an end-to-end multi-task deep neural model, named Deep Image to Time (DeepI2T), to learn the travel time mainly from the built environment images, a.k.a. the morphological layout images, and showoff the new state-of-the-art performance on real-world datasets in two cities. Moreover, our model is designed to tackle both path-aware and path-blind scenarios in the testing phase. This work opens up new opportunities of using the publicly available morphological layout images as considerable information in multiple geography-related smart city applications.

* Accepted at IJCAI 2019

Via

Access Paper or Ask Questions

Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering

Aug 23, 2018

Wuwei Lan, Wei Xu

Figure 1 for Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering

Figure 2 for Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering

Figure 3 for Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering

Figure 4 for Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering

Abstract:In this paper, we analyze several neural network designs (and their variations) for sentence pair modeling and compare their performance extensively across eight datasets, including paraphrase identification, semantic textual similarity, natural language inference, and question answering tasks. Although most of these models have claimed state-of-the-art performance, the original papers often reported on only one or two selected datasets. We provide a systematic study and show that (i) encoding contextual information by LSTM and inter-sentence interactions are critical, (ii) Tree-LSTM does not help as much as previously claimed but surprisingly improves performance on Twitter datasets, (iii) the Enhanced Sequential Inference Model is the best so far for larger datasets, while the Pairwise Word Interaction Model achieves the best performance when less data is available. We release our implementations as an open-source toolkit.

* 13 pages; accepted to COLING 2018

Via

Access Paper or Ask Questions