Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Robert L. Logan IV

Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election

Dec 17, 2024

Roberto Mondini, Neema Kotonya, Robert L. Logan IV, Elizabeth M Olson, Angela Oduor Lungati, Daniel Duke Odongo, Tim Ombasa, Hemank Lamba, Aoife Cahill, Joel R. Tetreault(+1 more)

Abstract:Online reporting platforms have enabled citizens around the world to collectively share their opinions and report in real time on events impacting their local communities. Systematically organizing (e.g., categorizing by attributes) and geotagging large amounts of crowdsourced information is crucial to ensuring that accurate and meaningful insights can be drawn from this data and used by policy makers to bring about positive change. These tasks, however, typically require extensive manual annotation efforts. In this paper we present Uchaguzi-2022, a dataset of 14k categorized and geotagged citizen reports related to the 2022 Kenyan General Election containing mentions of election-related issues such as official misconduct, vote count irregularities, and acts of violence. We use this dataset to investigate whether language models can assist in scalably categorizing and geotagging reports, thus highlighting its potential application in the AI for Social Good space.

* COLING 2025

Via

Access Paper or Ask Questions

BUMP: A Benchmark of Unfaithful Minimal Pairs for Meta-Evaluation of Faithfulness Metrics

Dec 20, 2022

Liang Ma, Shuyang Cao, Robert L. Logan IV, Di Lu, Shihao Ran, Ke Zhang, Joel Tetreault, Aoife Cahill, Alejandro Jaimes

Abstract:The proliferation of automatic faithfulness metrics for summarization has produced a need for benchmarks to evaluate them. While existing benchmarks measure the correlation with human judgements of faithfulness on model-generated summaries, they are insufficient for diagnosing whether metrics are: 1) consistent, i.e., decrease as errors are introduced into a summary, 2) effective on human-written texts, and 3) sensitive to different error types (as summaries can contain multiple errors). To address these needs, we present a benchmark of unfaithful minimal pairs (BUMP), a dataset of 889 human-written, minimally different summary pairs, where a single error (from an ontology of 7 types) is introduced to a summary from the CNN/DailyMail dataset to produce an unfaithful summary. We find BUMP complements existing benchmarks in a number of ways: 1) the summaries in BUMP are harder to discriminate and less probable under SOTA summarization models, 2) BUMP enables measuring the consistency of metrics, and reveals that the most discriminative metrics tend not to be the most consistent, 3) BUMP enables the measurement of metrics' performance on individual error types and highlights areas of weakness for future work.

Via

Access Paper or Ask Questions

Continued Pretraining for Better Zero- and Few-Shot Promptability

Oct 19, 2022

Zhaofeng Wu, Robert L. Logan IV, Pete Walsh, Akshita Bhagia, Dirk Groeneveld, Sameer Singh, Iz Beltagy

Figure 1 for Continued Pretraining for Better Zero- and Few-Shot Promptability

Figure 2 for Continued Pretraining for Better Zero- and Few-Shot Promptability

Figure 3 for Continued Pretraining for Better Zero- and Few-Shot Promptability

Figure 4 for Continued Pretraining for Better Zero- and Few-Shot Promptability

Abstract:Recently introduced language model prompting methods can achieve high accuracy in zero- and few-shot settings while requiring few to no learned task-specific parameters. Nevertheless, these methods still often trail behind full model finetuning. In this work, we investigate if a dedicated continued pretraining stage could improve "promptability", i.e., zero-shot performance with natural language prompts or few-shot performance with prompt tuning. We reveal settings where existing continued pretraining methods lack promptability. We also identify current methodological gaps, which we fill with thorough large-scale experiments. We demonstrate that a simple recipe, continued pretraining that incorporates a trainable prompt during multi-task learning, leads to improved promptability in both zero- and few-shot settings compared to existing methods, up to 31% relative. On the other hand, we find that continued pretraining using MAML-style meta-learning, a method that directly optimizes few-shot promptability, yields subpar performance. We validate our findings with two prompt tuning methods, and, based on our results, we provide concrete recommendations to optimize promptability for different use cases.

* EMNLP 2022

Via

Access Paper or Ask Questions

Impact of Pretraining Term Frequencies on Few-Shot Reasoning

Feb 15, 2022

Yasaman Razeghi, Robert L. Logan IV, Matt Gardner, Sameer Singh

Figure 1 for Impact of Pretraining Term Frequencies on Few-Shot Reasoning

Figure 2 for Impact of Pretraining Term Frequencies on Few-Shot Reasoning

Figure 3 for Impact of Pretraining Term Frequencies on Few-Shot Reasoning

Figure 4 for Impact of Pretraining Term Frequencies on Few-Shot Reasoning

Abstract:Pretrained Language Models (LMs) have demonstrated ability to perform numerical reasoning by extrapolating from a few examples in few-shot settings. However, the extent to which this extrapolation relies on robust reasoning is unclear. In this paper, we investigate how well these models reason with terms that are less frequent in the pretraining data. In particular, we examine the correlations between the model performance on test instances and the frequency of terms from those instances in the pretraining data. We measure the strength of this correlation for a number of GPT-based language models (pretrained on the Pile dataset) on various numerical deduction tasks (e.g., arithmetic and unit conversion). Our results consistently demonstrate that models are more accurate on instances whose terms are more prevalent, in some cases above $70\%$ (absolute) more accurate on the top 10\% frequent terms in comparison to the bottom 10\%. Overall, although LMs exhibit strong performance at few-shot numerical reasoning tasks, our results raise the question of how much models actually generalize beyond pretraining data, and we encourage researchers to take the pretraining data into account when interpreting evaluation results.

Via

Access Paper or Ask Questions

FRUIT: Faithfully Reflecting Updated Information in Text

Dec 16, 2021

Robert L. Logan IV, Alexandre Passos, Sameer Singh, Ming-Wei Chang

Figure 1 for FRUIT: Faithfully Reflecting Updated Information in Text

Figure 2 for FRUIT: Faithfully Reflecting Updated Information in Text

Figure 3 for FRUIT: Faithfully Reflecting Updated Information in Text

Figure 4 for FRUIT: Faithfully Reflecting Updated Information in Text

Abstract:Textual knowledge bases such as Wikipedia require considerable effort to keep up to date and consistent. While automated writing assistants could potentially ease this burden, the problem of suggesting edits grounded in external knowledge has been under-explored. In this paper, we introduce the novel generation task of *faithfully reflecting updated information in text*(FRUIT) where the goal is to update an existing article given new evidence. We release the FRUIT-WIKI dataset, a collection of over 170K distantly supervised data produced from pairs of Wikipedia snapshots, along with our data generation pipeline and a gold evaluation set of 914 instances whose edits are guaranteed to be supported by the evidence. We provide benchmark results for popular generation systems as well as EDIT5 -- a T5-based approach tailored to editing we introduce that establishes the state of the art. Our analysis shows that developing models that can update articles faithfully requires new capabilities for neural generation models, and opens doors to many new applications.

* v1.0

Via

Access Paper or Ask Questions

Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

Jul 01, 2021

Robert L. Logan IV, Ivana Balažević, Eric Wallace, Fabio Petroni, Sameer Singh, Sebastian Riedel

Figure 1 for Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

Figure 2 for Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

Figure 3 for Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

Figure 4 for Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

Abstract:Prompting language models (LMs) with training examples and task descriptions has been seen as critical to recent successes in few-shot learning. In this work, we show that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering. In fact, one can use null prompts, prompts that contain neither task-specific templates nor training examples, and achieve competitive accuracy to manually-tuned prompts across a wide range of tasks. While finetuning LMs does introduce new parameters for each downstream task, we show that this memory overhead can be substantially reduced: finetuning only the bias terms can achieve comparable or better accuracy than standard finetuning while only updating 0.1% of the parameters. All in all, we recommend finetuning LMs for few-shot learning as it is more accurate, robust to different prompts, and can be made nearly as efficient as using frozen LMs.

Via

Access Paper or Ask Questions

AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts

Nov 07, 2020

Taylor Shin, Yasaman Razeghi, Robert L. Logan IV, Eric Wallace, Sameer Singh

Figure 1 for AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts

Figure 2 for AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts

Figure 3 for AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts

Figure 4 for AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts

Abstract:The remarkable success of pretrained language models has motivated the study of what kinds of knowledge these models learn during pretraining. Reformulating tasks as fill-in-the-blanks problems (e.g., cloze tests) is a natural approach for gauging such knowledge, however, its usage is limited by the manual effort and guesswork required to write suitable prompts. To address this, we develop AutoPrompt, an automated method to create prompts for a diverse set of tasks, based on a gradient-guided search. Using AutoPrompt, we show that masked language models (MLMs) have an inherent capability to perform sentiment analysis and natural language inference without additional parameters or finetuning, sometimes achieving performance on par with recent state-of-the-art supervised models. We also show that our prompts elicit more accurate factual knowledge from MLMs than the manually created prompts on the LAMA benchmark, and that MLMs can be used as relation extractors more effectively than supervised relation extraction models. These results demonstrate that automatically generated prompts are a viable parameter-free alternative to existing probing methods, and as pretrained LMs become more sophisticated and capable, potentially a replacement for finetuning.

* v2: Fixed error in Figure 2

Via

Access Paper or Ask Questions

Active Bayesian Assessment for Black-Box Classifiers

Feb 16, 2020

Disi Ji, Robert L. Logan IV, Padhraic Smyth, Mark Steyvers

Figure 1 for Active Bayesian Assessment for Black-Box Classifiers

Figure 2 for Active Bayesian Assessment for Black-Box Classifiers

Figure 3 for Active Bayesian Assessment for Black-Box Classifiers

Figure 4 for Active Bayesian Assessment for Black-Box Classifiers

Abstract:Recent advances in machine learning have led to increased deployment of black-box classifiers across a wide variety of applications. In many such situations there is a crucial need to assess the performance of these pre-trained models, for instance to ensure sufficient predictive accuracy, or that class probabilities are well-calibrated. Furthermore, since labeled data may be scarce or costly to collect, it is desirable for such assessment be performed in an efficient manner. In this paper, we introduce a Bayesian approach for model assessment that satisfies these desiderata. We develop inference strategies to quantify uncertainty for common assessment metrics (accuracy, misclassification cost, expected calibration error), and propose a framework for active assessment using this uncertainty to guide efficient selection of instances for labeling. We illustrate the benefits of our approach in experiments assessing the performance of modern neural classifiers (e.g., ResNet and BERT) on several standard image and text classification datasets.

Via

Access Paper or Ask Questions

Knowledge Enhanced Contextual Word Representations

Sep 09, 2019

Matthew E. Peters, Mark Neumann, Robert L. Logan IV, Roy Schwartz, Vidur Joshi, Sameer Singh, Noah A. Smith

Figure 1 for Knowledge Enhanced Contextual Word Representations

Figure 2 for Knowledge Enhanced Contextual Word Representations

Figure 3 for Knowledge Enhanced Contextual Word Representations

Figure 4 for Knowledge Enhanced Contextual Word Representations

Abstract:Contextual word representations, typically trained on unstructured, unlabeled text, do not contain any explicit grounding to real world entities and are often unable to remember facts about those entities. We propose a general method to embed multiple knowledge bases (KBs) into large scale models, and thereby enhance their representations with structured, human-curated knowledge. For each KB, we first use an integrated entity linker to retrieve relevant entity embeddings, then update contextual word representations via a form of word-to-entity attention. In contrast to previous approaches, the entity linkers and self-supervised language modeling objective are jointly trained end-to-end in a multitask setting that combines a small amount of entity linking supervision with a large amount of raw text. After integrating WordNet and a subset of Wikipedia into BERT, the knowledge enhanced BERT (KnowBert) demonstrates improved perplexity, ability to recall facts as measured in a probing task and downstream performance on relationship extraction, entity typing, and word sense disambiguation. KnowBert's runtime is comparable to BERT's and it scales to large KBs.

* EMNLP 2019

Via

Access Paper or Ask Questions

Barack's Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling

Jun 20, 2019

Robert L. Logan IV, Nelson F. Liu, Matthew E. Peters, Matt Gardner, Sameer Singh

Figure 1 for Barack's Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling

Figure 2 for Barack's Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling

Figure 3 for Barack's Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling

Figure 4 for Barack's Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling

Abstract:Modeling human language requires the ability to not only generate fluent text but also encode factual knowledge. However, traditional language models are only capable of remembering facts seen at training time, and often have difficulty recalling them. To address this, we introduce the knowledge graph language model (KGLM), a neural language model with mechanisms for selecting and copying facts from a knowledge graph that are relevant to the context. These mechanisms enable the model to render information it has never seen before, as well as generate out-of-vocabulary tokens. We also introduce the Linked WikiText-2 dataset, a corpus of annotated text aligned to the Wikidata knowledge graph whose contents (roughly) match the popular WikiText-2 benchmark. In experiments, we demonstrate that the KGLM achieves significantly better performance than a strong baseline language model. We additionally compare different language model's ability to complete sentences requiring factual knowledge, showing that the KGLM outperforms even very large language models in generating facts.

Via

Access Paper or Ask Questions