Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yoav Kantor

An Analysis of Hyper-Parameter Optimization Methods for Retrieval Augmented Generation

May 06, 2025

Matan Orbach, Ohad Eytan, Benjamin Sznajder, Ariel Gera, Odellia Boni, Yoav Kantor, Gal Bloch, Omri Levy, Hadas Abraham, Nitzan Barzilay(+5 more)

Abstract:Finding the optimal Retrieval-Augmented Generation (RAG) configuration for a given use case can be complex and expensive. Motivated by this challenge, frameworks for RAG hyper-parameter optimization (HPO) have recently emerged, yet their effectiveness has not been rigorously benchmarked. To address this gap, we present a comprehensive study involving 5 HPO algorithms over 5 datasets from diverse domains, including a new one collected for this work on real-world product documentation. Our study explores the largest HPO search space considered to date, with two optimized evaluation metrics. Analysis of the results shows that RAG HPO can be done efficiently, either greedily or with iterative random search, and that it significantly boosts RAG performance for all datasets. For greedy HPO approaches, we show that optimizing models first is preferable to the prevalent practice of optimizing sequentially according to the RAG pipeline order.

Via

Access Paper or Ask Questions

CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies

Nov 19, 2023

Arie Cattan, Tom Hope, Doug Downey, Roy Bar-Haim, Lilach Eden, Yoav Kantor, Ido Dagan

Abstract:Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items. Examples include generating entailment graphs, hierarchical cross-document coreference resolution, annotating event and subevent relations, etc. To enable efficient annotation of such hierarchical structures, we release CHAMP, an open source tool allowing to incrementally construct both clusters and hierarchy simultaneously over any type of texts. This incremental approach significantly reduces annotation time compared to the common pairwise annotation approach and also guarantees maintaining transitivity at the cluster and hierarchy levels. Furthermore, CHAMP includes a consolidation mode, where an adjudicator can easily compare multiple cluster hierarchy annotations and resolve disagreements.

* EMNLP 2023

Via

Access Paper or Ask Questions

From Key Points to Key Point Hierarchy: Structured and Expressive Opinion Summarization

Jun 06, 2023

Arie Cattan, Lilach Eden, Yoav Kantor, Roy Bar-Haim

Abstract:Key Point Analysis (KPA) has been recently proposed for deriving fine-grained insights from collections of textual comments. KPA extracts the main points in the data as a list of concise sentences or phrases, termed key points, and quantifies their prevalence. While key points are more expressive than word clouds and key phrases, making sense of a long, flat list of key points, which often express related ideas in varying levels of granularity, may still be challenging. To address this limitation of KPA, we introduce the task of organizing a given set of key points into a hierarchy, according to their specificity. Such hierarchies may be viewed as a novel type of Textual Entailment Graph. We develop ThinkP, a high quality benchmark dataset of key point hierarchies for business and product reviews, obtained by consolidating multiple annotations. We compare different methods for predicting pairwise relations between key points, and for inferring a hierarchy from these pairwise predictions. In particular, for the task of computing pairwise key point relations, we achieve significant gains over existing strong baselines by applying directional distributional similarity methods to a novel distributional representation of key points, and further boost performance via weak supervision.

* ACL 2023

Via

Access Paper or Ask Questions

Project Debater APIs: Decomposing the AI Grand Challenge

Oct 03, 2021

Roy Bar-Haim, Yoav Kantor, Elad Venezian, Yoav Katz, Noam Slonim

Figure 1 for Project Debater APIs: Decomposing the AI Grand Challenge

Figure 2 for Project Debater APIs: Decomposing the AI Grand Challenge

Figure 3 for Project Debater APIs: Decomposing the AI Grand Challenge

Abstract:Project Debater was revealed in 2019 as the first AI system that can debate human experts on complex topics. Engaging in a live debate requires a diverse set of skills, and Project Debater has been developed accordingly as a collection of components, each designed to perform a specific subtask. Project Debater APIs provide access to many of these capabilities, as well as to more recently developed ones. This diverse set of web services, publicly available for academic use, includes core NLP services, argument mining and analysis capabilities, and higher-level services for content summarization. We describe these APIs and their performance, and demonstrate how they can be used for building practical solutions. In particular, we will focus on Key Point Analysis, a novel technology that identifies the main points and their prevalence in a collection of texts such as survey responses and user reviews.

* EMNLP 2021 (Demonstrations)

Via

Access Paper or Ask Questions

Every Bite Is an Experience: Key Point Analysis of Business Reviews

Jun 12, 2021

Roy Bar-Haim, Lilach Eden, Yoav Kantor, Roni Friedman, Noam Slonim

Figure 1 for Every Bite Is an Experience: Key Point Analysis of Business Reviews

Figure 2 for Every Bite Is an Experience: Key Point Analysis of Business Reviews

Figure 3 for Every Bite Is an Experience: Key Point Analysis of Business Reviews

Figure 4 for Every Bite Is an Experience: Key Point Analysis of Business Reviews

Abstract:Previous work on review summarization focused on measuring the sentiment toward the main aspects of the reviewed product or business, or on creating a textual summary. These approaches provide only a partial view of the data: aspect-based sentiment summaries lack sufficient explanation or justification for the aspect rating, while textual summaries do not quantify the significance of each element, and are not well-suited for representing conflicting views. Recently, Key Point Analysis (KPA) has been proposed as a summarization framework that provides both textual and quantitative summary of the main points in the data. We adapt KPA to review data by introducing Collective Key Point Mining for better key point extraction; integrating sentiment analysis into KPA; identifying good key point candidates for review summaries; and leveraging the massive amount of available reviews and their metadata. We show empirically that these novel extensions of KPA substantially improve its performance. We demonstrate that promising results can be achieved without any domain-specific annotation, while human supervision can lead to further improvement.

* ACL-IJCNLP 2021

Via

Access Paper or Ask Questions

Quantitative Argument Summarization and Beyond: Cross-Domain Key Point Analysis

Oct 11, 2020

Roy Bar-Haim, Yoav Kantor, Lilach Eden, Roni Friedman, Dan Lahav, Noam Slonim

Figure 1 for Quantitative Argument Summarization and Beyond: Cross-Domain Key Point Analysis

Figure 2 for Quantitative Argument Summarization and Beyond: Cross-Domain Key Point Analysis

Figure 3 for Quantitative Argument Summarization and Beyond: Cross-Domain Key Point Analysis

Figure 4 for Quantitative Argument Summarization and Beyond: Cross-Domain Key Point Analysis

Abstract:When summarizing a collection of views, arguments or opinions on some topic, it is often desirable not only to extract the most salient points, but also to quantify their prevalence. Work on multi-document summarization has traditionally focused on creating textual summaries, which lack this quantitative aspect. Recent work has proposed to summarize arguments by mapping them to a small set of expert-generated key points, where the salience of each key point corresponds to the number of its matching arguments. The current work advances key point analysis in two important respects: first, we develop a method for automatic extraction of key points, which enables fully automatic analysis, and is shown to achieve performance comparable to a human expert. Second, we demonstrate that the applicability of key point analysis goes well beyond argumentation data. Using models trained on publicly available argumentation datasets, we achieve promising results in two additional domains: municipal surveys and user reviews. An additional contribution is an in-depth evaluation of argument-to-key point matching models, where we substantially outperform previous results.

* EMNLP 2020

Via

Access Paper or Ask Questions

From Arguments to Key Points: Towards Automatic Argument Summarization

May 04, 2020

Roy Bar-Haim, Lilach Eden, Roni Friedman, Yoav Kantor, Dan Lahav, Noam Slonim

Figure 1 for From Arguments to Key Points: Towards Automatic Argument Summarization

Figure 2 for From Arguments to Key Points: Towards Automatic Argument Summarization

Figure 3 for From Arguments to Key Points: Towards Automatic Argument Summarization

Figure 4 for From Arguments to Key Points: Towards Automatic Argument Summarization

Abstract:Generating a concise summary from a large collection of arguments on a given topic is an intriguing yet understudied problem. We propose to represent such summaries as a small set of talking points, termed "key points", each scored according to its salience. We show, by analyzing a large dataset of crowd-contributed arguments, that a small number of key points per topic is typically sufficient for covering the vast majority of the arguments. Furthermore, we found that a domain expert can often predict these key points in advance. We study the task of argument-to-key point mapping, and introduce a novel large-scale dataset for this task. We report empirical results for an extensive set of experiments with this dataset, showing promising performance.

* ACL 2020

Via

Access Paper or Ask Questions

A Dataset of General-Purpose Rebuttal

Sep 01, 2019

Matan Orbach, Yonatan Bilu, Ariel Gera, Yoav Kantor, Lena Dankin, Tamar Lavee, Lili Kotlerman, Shachar Mirkin, Michal Jacovi, Ranit Aharonov(+1 more)

Figure 1 for A Dataset of General-Purpose Rebuttal

Figure 2 for A Dataset of General-Purpose Rebuttal

Figure 3 for A Dataset of General-Purpose Rebuttal

Figure 4 for A Dataset of General-Purpose Rebuttal

Abstract:In Natural Language Understanding, the task of response generation is usually focused on responses to short texts, such as tweets or a turn in a dialog. Here we present a novel task of producing a critical response to a long argumentative text, and suggest a method based on general rebuttal arguments to address it. We do this in the context of the recently-suggested task of listening comprehension over argumentative content: given a speech on some specified topic, and a list of relevant arguments, the goal is to determine which of the arguments appear in the speech. The general rebuttals we describe here (written in English) overcome the need for topic-specific arguments to be provided, by proving to be applicable for a large set of topics. This allows creating responses beyond the scope of topics for which specific arguments are available. All data collected during this work is freely available for research.

* EMNLP 2019

Via

Access Paper or Ask Questions

Towards Effective Rebuttal: Listening Comprehension using Corpus-Wide Claim Mining

Jul 27, 2019

Tamar Lavee, Matan Orbach, Lili Kotlerman, Yoav Kantor, Shai Gretz, Lena Dankin, Shachar Mirkin, Michal Jacovi, Yonatan Bilu, Ranit Aharonov(+1 more)

Figure 1 for Towards Effective Rebuttal: Listening Comprehension using Corpus-Wide Claim Mining

Figure 2 for Towards Effective Rebuttal: Listening Comprehension using Corpus-Wide Claim Mining

Abstract:Engaging in a live debate requires, among other things, the ability to effectively rebut arguments claimed by your opponent. In particular, this requires identifying these arguments. Here, we suggest doing so by automatically mining claims from a corpus of news articles containing billions of sentences, and searching for them in a given speech. This raises the question of whether such claims indeed correspond to those made in spoken speeches. To this end, we collected a large dataset of $400$ speeches in English discussing $200$ controversial topics, mined claims for each topic, and asked annotators to identify the mined claims mentioned in each speech. Results show that in the vast majority of speeches debaters indeed make use of such claims. In addition, we present several baselines for the automatic detection of mined claims in speeches, forming the basis for future work. All collected data is freely available for research.

* 6th Argument Mining Workshop @ ACL 2019

Via

Access Paper or Ask Questions

Learning to combine Grammatical Error Corrections

Jun 10, 2019

Yoav Kantor, Yoav Katz, Leshem Choshen, Edo Cohen-Karlik, Naftali Liberman, Assaf Toledo, Amir Menczel, Noam Slonim

Figure 1 for Learning to combine Grammatical Error Corrections

Figure 2 for Learning to combine Grammatical Error Corrections

Figure 3 for Learning to combine Grammatical Error Corrections

Figure 4 for Learning to combine Grammatical Error Corrections

Abstract:The field of Grammatical Error Correction (GEC) has produced various systems to deal with focused phenomena or general text editing. We propose an automatic way to combine black-box systems. Our method automatically detects the strength of a system or the combination of several systems per error type, improving precision and recall while optimizing $F$ score directly. We show consistent improvement over the best standalone system in all the configurations tested. This approach also outperforms average ensembling of different RNN models with random initializations. In addition, we analyze the use of BERT for GEC - reporting promising results on this end. We also present a spellchecker created for this task which outperforms standard spellcheckers tested on the task of spellchecking. This paper describes a system submission to Building Educational Applications 2019 Shared Task: Grammatical Error Correction. Combining the output of top BEA 2019 shared task systems using our approach, currently holds the highest reported score in the open phase of the BEA 2019 shared task, improving F0.5 by 3.7 points over the best result reported.

* BEA 2019

Via

Access Paper or Ask Questions