Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel Preotiuc-Pietro

Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies

May 27, 2025

Terrance Liu, Shuyi Wang, Daniel Preotiuc-Pietro, Yash Chandarana, Chirag Gupta

Abstract:While large language models (LLMs) achieve strong performance on text-to-SQL parsing, they sometimes exhibit unexpected failures in which they are confidently incorrect. Building trustworthy text-to-SQL systems thus requires eliciting reliable uncertainty measures from the LLM. In this paper, we study the problem of providing a calibrated confidence score that conveys the likelihood of an output query being correct. Our work is the first to establish a benchmark for post-hoc calibration of LLM-based text-to-SQL parsing. In particular, we show that Platt scaling, a canonical method for calibration, provides substantial improvements over directly using raw model output probabilities as confidence scores. Furthermore, we propose a method for text-to-SQL calibration that leverages the structured nature of SQL queries to provide more granular signals of correctness, named "sub-clause frequency" (SCF) scores. Using multivariate Platt scaling (MPS), our extension of the canonical Platt scaling technique, we combine individual SCF scores into an overall accurate and calibrated score. Empirical evaluation on two popular text-to-SQL datasets shows that our approach of combining MPS and SCF yields further improvements in calibration and the related task of error detection over traditional Platt scaling.

Via

Access Paper or Ask Questions

Leveraging Contextual Information for Effective Entity Salience Detection

Sep 14, 2023

Rajarshi Bhowmik, Marco Ponza, Atharva Tendle, Anant Gupta, Rebecca Jiang, Xingyu Lu, Qian Zhao, Daniel Preotiuc-Pietro

Figure 1 for Leveraging Contextual Information for Effective Entity Salience Detection

Figure 2 for Leveraging Contextual Information for Effective Entity Salience Detection

Figure 3 for Leveraging Contextual Information for Effective Entity Salience Detection

Figure 4 for Leveraging Contextual Information for Effective Entity Salience Detection

Abstract:In text documents such as news articles, the content and key events usually revolve around a subset of all the entities mentioned in a document. These entities, often deemed as salient entities, provide useful cues of the aboutness of a document to a reader. Identifying the salience of entities was found helpful in several downstream applications such as search, ranking, and entity-centric summarization, among others. Prior work on salient entity detection mainly focused on machine learning models that require heavy feature engineering. We show that fine-tuning medium-sized language models with a cross-encoder style architecture yields substantial performance gains over feature engineering approaches. To this end, we conduct a comprehensive benchmarking of four publicly available datasets using models representative of the medium-sized pre-trained language model family. Additionally, we show that zero-shot prompting of instruction-tuned language models yields inferior results, indicating the task's uniqueness and complexity.

Via

Access Paper or Ask Questions

Unsupervised Contrast-Consistent Ranking with Language Models

Sep 13, 2023

Niklas Stoehr, Pengxiang Cheng, Jing Wang, Daniel Preotiuc-Pietro, Rajarshi Bhowmik

Figure 1 for Unsupervised Contrast-Consistent Ranking with Language Models

Figure 2 for Unsupervised Contrast-Consistent Ranking with Language Models

Figure 3 for Unsupervised Contrast-Consistent Ranking with Language Models

Figure 4 for Unsupervised Contrast-Consistent Ranking with Language Models

Abstract:Language models contain ranking-based knowledge and are powerful solvers of in-context ranking tasks. For instance, they may have parametric knowledge about the ordering of countries by size or may be able to rank reviews by sentiment. Recent work focuses on pairwise, pointwise, and listwise prompting techniques to elicit a language model's ranking knowledge. However, we find that even with careful calibration and constrained decoding, prompting-based techniques may not always be self-consistent in the rankings they produce. This motivates us to explore an alternative approach that is inspired by an unsupervised probing method called Contrast-Consistent Search (CCS). The idea is to train a probing model guided by a logical constraint: a model's representation of a statement and its negation must be mapped to contrastive true-false poles consistently across multiple statements. We hypothesize that similar constraints apply to ranking tasks where all items are related via consistent pairwise or listwise comparisons. To this end, we extend the binary CCS method to Contrast-Consistent Ranking (CCR) by adapting existing ranking methods such as the Max-Margin Loss, Triplet Loss, and Ordinal Regression objective. Our results confirm that, for the same language model, CCR probing outperforms prompting and even performs on a par with prompting much larger language models.

Via

Access Paper or Ask Questions

Overcoming Catastrophic Forgetting in Massively Multilingual Continual Learning

May 25, 2023

Genta Indra Winata, Lingjue Xie, Karthik Radhakrishnan, Shijie Wu, Xisen Jin, Pengxiang Cheng, Mayank Kulkarni, Daniel Preotiuc-Pietro

Figure 1 for Overcoming Catastrophic Forgetting in Massively Multilingual Continual Learning

Figure 2 for Overcoming Catastrophic Forgetting in Massively Multilingual Continual Learning

Figure 3 for Overcoming Catastrophic Forgetting in Massively Multilingual Continual Learning

Figure 4 for Overcoming Catastrophic Forgetting in Massively Multilingual Continual Learning

Abstract:Real-life multilingual systems should be able to efficiently incorporate new languages as data distributions fed to the system evolve and shift over time. To do this, systems need to handle the issue of catastrophic forgetting, where the model performance drops for languages or tasks seen further in its past. In this paper, we study catastrophic forgetting, as well as methods to minimize this, in a massively multilingual continual learning framework involving up to 51 languages and covering both classification and sequence labeling tasks. We present LR ADJUST, a learning rate scheduling method that is simple, yet effective in preserving new information without strongly overwriting past knowledge. Furthermore, we show that this method is effective across multiple continual learning approaches. Finally, we provide further insights into the dynamics of catastrophic forgetting in this massively multilingual setup.

* ACL 2023 Findings

Via

Access Paper or Ask Questions

Dataless Knowledge Fusion by Merging Weights of Language Models

Dec 19, 2022

Xisen Jin, Xiang Ren, Daniel Preotiuc-Pietro, Pengxiang Cheng

Figure 1 for Dataless Knowledge Fusion by Merging Weights of Language Models

Figure 2 for Dataless Knowledge Fusion by Merging Weights of Language Models

Figure 3 for Dataless Knowledge Fusion by Merging Weights of Language Models

Figure 4 for Dataless Knowledge Fusion by Merging Weights of Language Models

Abstract:Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. Oftentimes fine-tuned models are readily available but their training data is not, due to data privacy or intellectual property concerns. This creates a barrier to fusing knowledge across individual models to yield a better single model. In this paper, we study the problem of merging individual models built on different training data sets to obtain a single model that performs well both across all data set domains and can generalize on out-of-domain data. We propose a dataless knowledge fusion method that merges models in their parameter space, guided by weights that minimize prediction differences between the merged model and the individual models. Over a battery of evaluation settings, we show that the proposed method significantly outperforms baselines such as Fisher-weighted averaging or model ensembling. Further, we find that our method is a promising alternative to multi-task learning that can preserve or sometimes improve over the individual models without access to the training data. Finally, model merging is more efficient than training a multi-task model, thus making it applicable to a wider set of scenarios.

Via

Access Paper or Ask Questions

EntSUM: A Data Set for Entity-Centric Summarization

Apr 05, 2022

Mounica Maddela, Mayank Kulkarni, Daniel Preotiuc-Pietro

Figure 1 for EntSUM: A Data Set for Entity-Centric Summarization

Figure 2 for EntSUM: A Data Set for Entity-Centric Summarization

Figure 3 for EntSUM: A Data Set for Entity-Centric Summarization

Figure 4 for EntSUM: A Data Set for Entity-Centric Summarization

Abstract:Controllable summarization aims to provide summaries that take into account user-specified aspects and preferences to better assist them with their information need, as opposed to the standard summarization setup which build a single generic summary of a document. We introduce a human-annotated data set EntSUM for controllable summarization with a focus on named entities as the aspects to control. We conduct an extensive quantitative analysis to motivate the task of entity-centric summarization and show that existing methods for controllable summarization fail to generate entity-centric summaries. We propose extensions to state-of-the-art summarization approaches that achieve substantially better results on our data set. Our analysis and results show the challenging nature of this task and of the proposed data set.

* Accepted at ACL 2022

Via

Access Paper or Ask Questions

Analyzing Political Parody in Social Media

May 01, 2020

Antonis Maronikolakis, Danae Sanchez Villegas, Daniel Preotiuc-Pietro, Nikolaos Aletras

Figure 1 for Analyzing Political Parody in Social Media

Figure 2 for Analyzing Political Parody in Social Media

Figure 3 for Analyzing Political Parody in Social Media

Figure 4 for Analyzing Political Parody in Social Media

Abstract:Parody is a figurative device used to imitate an entity for comedic or critical purposes and represents a widespread phenomenon in social media through many popular parody accounts. In this paper, we present the first computational study of parody. We introduce a new publicly available data set of tweets from real politicians and their corresponding parody accounts. We run a battery of supervised machine learning models for automatically detecting parody tweets with an emphasis on robustness by testing on tweets from accounts unseen in training, across different genders and across countries. Our results show that political parody tweets can be predicted with an accuracy up to 90%. Finally, we identify the markers of parody through a linguistic analysis. Beyond research in linguistics and political communication, accurately and automatically detecting parody is important to improving fact checking for journalists and analytics such as sentiment analysis through filtering out parodical utterances.

Via

Access Paper or Ask Questions

Automatically Identifying Complaints in Social Media

Jun 10, 2019

Daniel Preotiuc-Pietro, Mihaela Gaman, Nikolaos Aletras

Figure 1 for Automatically Identifying Complaints in Social Media

Figure 2 for Automatically Identifying Complaints in Social Media

Figure 3 for Automatically Identifying Complaints in Social Media

Figure 4 for Automatically Identifying Complaints in Social Media

Abstract:Complaining is a basic speech act regularly used in human and computer mediated communication to express a negative mismatch between reality and expectations in a particular situation. Automatically identifying complaints in social media is of utmost importance for organizations or brands to improve the customer experience or in developing dialogue systems for handling and responding to complaints. In this paper, we introduce the first systematic analysis of complaints in computational linguistics. We collect a new annotated data set of written complaints expressed in English on Twitter.\footnote{Data and code is available here: \url{https://github.com/danielpreotiuc/complaints-social-media}} We present an extensive linguistic analysis of complaining as a speech act in social media and train strong feature-based and neural models of complaints across nine domains achieving a predictive performance of up to 79 F1 using distant supervision.

* Accepted at ACL 2019

Via

Access Paper or Ask Questions