Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Benjamin Sznajder

Label-Efficient Model Selection for Text Generation

Feb 12, 2024

Shir Ashury-Tahan, Benjamin Sznajder, Leshem Choshen, Liat Ein-Dor, Eyal Shnarch, Ariel Gera

Abstract:Model selection for a given target task can be costly, as it may entail extensive annotation of the quality of outputs of different models. We introduce DiffUse, an efficient method to make an informed decision between candidate text generation models. DiffUse reduces the required amount of preference annotations, thus saving valuable time and resources in performing evaluation. DiffUse intelligently selects instances by clustering embeddings that represent the semantic differences between model outputs. Thus, it is able to identify a subset of examples that are more informative for preference decisions. Our method is model-agnostic, and can be applied to any text generation model. Moreover, we propose a practical iterative approach for dynamically determining how many instances to annotate. In a series of experiments over hundreds of model pairs, we demonstrate that DiffUse can dramatically reduce the required number of annotations -- by up to 75% -- while maintaining high evaluation reliability.

Via

Access Paper or Ask Questions

The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers

May 02, 2023

Ariel Gera, Roni Friedman, Ofir Arviv, Chulaka Gunasekara, Benjamin Sznajder, Noam Slonim, Eyal Shnarch

Abstract:Applying language models to natural language processing tasks typically relies on the representations in the final model layer, as intermediate hidden layer representations are presumed to be less informative. In this work, we argue that due to the gradual improvement across model layers, additional information can be gleaned from the contrast between higher and lower layers during inference. Specifically, in choosing between the probable next token predictions of a generative model, the predictions of lower layers can be used to highlight which candidates are best avoided. We propose a novel approach that utilizes the contrast between layers to improve text generation outputs, and show that it mitigates degenerative behaviors of the model in open-ended generation, significantly improving the quality of generated texts. Furthermore, our results indicate that contrasting between model layers at inference time can yield substantial benefits to certain aspects of general language model capabilities, more effectively extracting knowledge during inference from a given set of model parameters.

* 9 pages, 8 figures; To be published in ACL 2023

Via

Access Paper or Ask Questions

Heuristic-based Inter-training to Improve Few-shot Multi-perspective Dialog Summarization

Mar 30, 2022

Benjamin Sznajder, Chulaka Gunasekara, Guy Lev, Sachin Joshi, Eyal Shnarch, Noam Slonim

Figure 1 for Heuristic-based Inter-training to Improve Few-shot Multi-perspective Dialog Summarization

Figure 2 for Heuristic-based Inter-training to Improve Few-shot Multi-perspective Dialog Summarization

Figure 3 for Heuristic-based Inter-training to Improve Few-shot Multi-perspective Dialog Summarization

Figure 4 for Heuristic-based Inter-training to Improve Few-shot Multi-perspective Dialog Summarization

Abstract:Many organizations require their customer-care agents to manually summarize their conversations with customers. These summaries are vital for decision making purposes of the organizations. The perspective of the summary that is required to be created depends on the application of the summaries. With this work, we study the multi-perspective summarization of customer-care conversations between support agents and customers. We observe that there are different heuristics that are associated with summaries of different perspectives, and explore these heuristics to create weak-labeled data for intermediate training of the models before fine-tuning with scarce human annotated summaries. Most importantly, we show that our approach supports models to generate multi-perspective summaries with a very small amount of annotated data. For example, our approach achieves 94\% of the performance (Rouge-2) of a model trained with the original data, by training only with 7\% of the original data.

Via

Access Paper or Ask Questions

TWEETSUMM -- A Dialog Summarization Dataset for Customer Service

Nov 23, 2021

Guy Feigenblat, Chulaka Gunasekara, Benjamin Sznajder, Sachindra Joshi, David Konopnicki, Ranit Aharonov

Figure 1 for TWEETSUMM -- A Dialog Summarization Dataset for Customer Service

Figure 2 for TWEETSUMM -- A Dialog Summarization Dataset for Customer Service

Figure 3 for TWEETSUMM -- A Dialog Summarization Dataset for Customer Service

Figure 4 for TWEETSUMM -- A Dialog Summarization Dataset for Customer Service

Abstract:In a typical customer service chat scenario, customers contact a support center to ask for help or raise complaints, and human agents try to solve the issues. In most cases, at the end of the conversation, agents are asked to write a short summary emphasizing the problem and the proposed solution, usually for the benefit of other agents that may have to deal with the same customer or issue. The goal of the present article is advancing the automation of this task. We introduce the first large scale, high quality, customer care dialog summarization dataset with close to 6500 human annotated summaries. The data is based on real-world customer support dialogs and includes both extractive and abstractive summaries. We also introduce a new unsupervised, extractive summarization method specific to dialogs.

* Findings of the Association for Computational Linguistics: EMNLP (2021) 245--260

Via

Access Paper or Ask Questions

HowSumm: A Multi-Document Summarization Dataset Derived from WikiHow Articles

Oct 08, 2021

Odellia Boni, Guy Feigenblat, Guy Lev, Michal Shmueli-Scheuer, Benjamin Sznajder, David Konopnicki

Figure 1 for HowSumm: A Multi-Document Summarization Dataset Derived from WikiHow Articles

Figure 2 for HowSumm: A Multi-Document Summarization Dataset Derived from WikiHow Articles

Figure 3 for HowSumm: A Multi-Document Summarization Dataset Derived from WikiHow Articles

Figure 4 for HowSumm: A Multi-Document Summarization Dataset Derived from WikiHow Articles

Abstract:We present HowSumm, a novel large-scale dataset for the task of query-focused multi-document summarization (qMDS), which targets the use-case of generating actionable instructions from a set of sources. This use-case is different from the use-cases covered in existing multi-document summarization (MDS) datasets and is applicable to educational and industrial scenarios. We employed automatic methods, and leveraged statistics from existing human-crafted qMDS datasets, to create HowSumm from wikiHow website articles and the sources they cite. We describe the creation of the dataset and discuss the unique features that distinguish it from other summarization corpora. Automatic and human evaluations of both extractive and abstractive summarization models on the dataset reveal that there is room for improvement.

* 8 pages, 4 figures, 5 tables. HowSumm dataset is publicly available at \url{https://ibm.biz/BdfhzH}

Via

Access Paper or Ask Questions

Summary Grounded Conversation Generation

Jun 07, 2021

Chulaka Gunasekara, Guy Feigenblat, Benjamin Sznajder, Sachindra Joshi, David Konopnicki

Figure 1 for Summary Grounded Conversation Generation

Figure 2 for Summary Grounded Conversation Generation

Figure 3 for Summary Grounded Conversation Generation

Figure 4 for Summary Grounded Conversation Generation

Abstract:Many conversation datasets have been constructed in the recent years using crowdsourcing. However, the data collection process can be time consuming and presents many challenges to ensure data quality. Since language generation has improved immensely in recent years with the advancement of pre-trained language models, we investigate how such models can be utilized to generate entire conversations, given only a summary of a conversation as the input. We explore three approaches to generate summary grounded conversations, and evaluate the generated conversations using automatic measures and human judgements. We also show that the accuracy of conversation summarization can be improved by augmenting a conversation summarization dataset with generated conversations.

* Findings of ACL - 2021, 9 pages

Via

Access Paper or Ask Questions

Financial Event Extraction Using Wikipedia-Based Weak Supervision

Nov 25, 2019

Liat Ein-Dor, Ariel Gera, Orith Toledo-Ronen, Alon Halfon, Benjamin Sznajder

Figure 1 for Financial Event Extraction Using Wikipedia-Based Weak Supervision

Figure 2 for Financial Event Extraction Using Wikipedia-Based Weak Supervision

Figure 3 for Financial Event Extraction Using Wikipedia-Based Weak Supervision

Figure 4 for Financial Event Extraction Using Wikipedia-Based Weak Supervision

Abstract:Extraction of financial and economic events from text has previously been done mostly using rule-based methods, with more recent works employing machine learning techniques. This work is in line with this latter approach, leveraging relevant Wikipedia sections to extract weak labels for sentences describing economic events. Whereas previous weakly supervised approaches required a knowledge-base of such events, or corresponding financial figures, our approach requires no such additional data, and can be employed to extract economic events related to companies which are not even mentioned in the training data.

Via

Access Paper or Ask Questions

Corpus Wide Argument Mining -- a Working Solution

Nov 25, 2019

Liat Ein-Dor, Eyal Shnarch, Lena Dankin, Alon Halfon, Benjamin Sznajder, Ariel Gera, Carlos Alzate, Martin Gleize, Leshem Choshen, Yufang Hou(+3 more)

Figure 1 for Corpus Wide Argument Mining -- a Working Solution

Figure 2 for Corpus Wide Argument Mining -- a Working Solution

Figure 3 for Corpus Wide Argument Mining -- a Working Solution

Figure 4 for Corpus Wide Argument Mining -- a Working Solution

Abstract:One of the main tasks in argument mining is the retrieval of argumentative content pertaining to a given topic. Most previous work addressed this task by retrieving a relatively small number of relevant documents as the initial source for such content. This line of research yielded moderate success, which is of limited use in a real-world system. Furthermore, for such a system to yield a comprehensive set of relevant arguments, over a wide range of topics, it requires leveraging a large and diverse corpus in an appropriate manner. Here we present a first end-to-end high-precision, corpus-wide argument mining system. This is made possible by combining sentence-level queries over an appropriate indexing of a very large corpus of newspaper articles, with an iterative annotation scheme. This scheme addresses the inherent label bias in the data and pinpoints the regions of the sample space whose manual labeling is required to obtain high-precision among top-ranked candidates.

* AAAI 2020

Via

Access Paper or Ask Questions

Argument Invention from First Principles

Aug 22, 2019

Yonatan Bilu, Ariel Gera, Daniel Hershcovich, Benjamin Sznajder, Dan Lahav, Guy Moshkowich, Anael Malet, Assaf Gavron, Noam Slonim

Figure 1 for Argument Invention from First Principles

Figure 2 for Argument Invention from First Principles

Figure 3 for Argument Invention from First Principles

Figure 4 for Argument Invention from First Principles

Abstract:Competitive debaters often find themselves facing a challenging task -- how to debate a topic they know very little about, with only minutes to prepare, and without access to books or the Internet? What they often do is rely on "first principles", commonplace arguments which are relevant to many topics, and which they have refined in past debates. In this work we aim to explicitly define a taxonomy of such principled recurring arguments, and, given a controversial topic, to automatically identify which of these arguments are relevant to the topic. As far as we know, this is the first time that this approach to argument invention is formalized and made explicit in the context of NLP. The main goal of this work is to show that it is possible to define such a taxonomy. While the taxonomy suggested here should be thought of as a "first attempt" it is nonetheless coherent, covers well the relevant topics and coincides with what professional debaters actually argue in their speeches, and facilitates automatic argument invention for new topics.

* Presented at ACL 2019

Via

Access Paper or Ask Questions

Controversy in Context

Aug 20, 2019

Benjamin Sznajder, Ariel Gera, Yonatan Bilu, Dafna Sheinwald, Ella Rabinovich, Ranit Aharonov, David Konopnicki, Noam Slonim

Abstract:With the growing interest in social applications of Natural Language Processing and Computational Argumentation, a natural question is how controversial a given concept is. Prior works relied on Wikipedia's metadata and on content analysis of the articles pertaining to a concept in question. Here we show that the immediate textual context of a concept is strongly indicative of this property, and, using simple and language-independent machine-learning tools, we leverage this observation to achieve state-of-the-art results in controversiality prediction. In addition, we analyze and make available a new dataset of concepts labeled for controversiality. It is significantly larger than existing datasets, and grades concepts on a 0-10 scale, rather than treating controversiality as a binary label.

* 5 pages

Via

Access Paper or Ask Questions