Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ella Rabinovich

School of Computer Science, The Academic College of Tel-Aviv Yaffo, Israel

On the Robustness of Agentic Function Calling

Apr 01, 2025

Ella Rabinovich, Ateret Anaby-Tavor

Abstract:Large Language Models (LLMs) are increasingly acting as autonomous agents, with function calling (FC) capabilities enabling them to invoke specific tools for tasks. While prior research has primarily focused on improving FC accuracy, little attention has been given to the robustness of these agents to perturbations in their input. We introduce a benchmark assessing FC robustness in two key areas: resilience to naturalistic query variations, and stability in function calling when the toolkit expands with semantically related tools. Evaluating best-performing FC models on a carefully expanded subset of the Berkeley function calling leaderboard (BFCL), we identify critical weaknesses in existing evaluation methodologies, and highlight areas for improvement in real-world agentic deployments.

* 7 pages, TrustNLP@NAACL25

Via

Access Paper or Ask Questions

A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios

Aug 04, 2024

Samuel Ackerman, Ella Rabinovich, Eitan Farchi, Ateret Anaby-Tavor

Abstract:We evaluate the robustness of several large language models on multiple datasets. Robustness here refers to the relative insensitivity of the model's answers to meaning-preserving variants of their input. Benchmark datasets are constructed by introducing naturally-occurring, non-malicious perturbations, or by generating semantically equivalent paraphrases of input questions or statements. We further propose a novel metric for assessing a model robustness, and demonstrate its benefits in the non-adversarial scenario by empirical evaluation of several models on the created datasets.

Via

Access Paper or Ask Questions

Automatic Extraction of Disease Risk Factors from Medical Publications

Jul 10, 2024

Maxim Rubchinsky, Ella Rabinovich, Adi Shraibman, Netanel Golan, Tali Sahar, Dorit Shweiki

Abstract:We present a novel approach to automating the identification of risk factors for diseases from medical literature, leveraging pre-trained models in the bio-medical domain, while tuning them for the specific task. Faced with the challenges of the diverse and unstructured nature of medical articles, our study introduces a multi-step system to first identify relevant articles, then classify them based on the presence of risk factor discussions and, finally, extract specific risk factor information for a disease through a question-answering model. Our contributions include the development of a comprehensive pipeline for the automated extraction of risk factors and the compilation of several datasets, which can serve as valuable resources for further research in this area. These datasets encompass a wide range of diseases, as well as their associated risk factors, meticulously identified and validated through a fine-grained evaluation scheme. We conducted both automatic and thorough manual evaluation, demonstrating encouraging results. We also highlight the importance of improving models and expanding dataset comprehensiveness to keep pace with the rapidly evolving field of medical research.

* BioNLP@ACL2024, 12 pages

Via

Access Paper or Ask Questions

That's Optional: A Contemporary Exploration of "that" Omission in English Subordinate Clauses

May 31, 2024

Ella Rabinovich

Abstract:The Uniform Information Density (UID) hypothesis posits that speakers optimize the communicative properties of their utterances by avoiding spikes in information, thereby maintaining a relatively uniform information profile over time. This paper investigates the impact of UID principles on syntactic reduction, specifically focusing on the optional omission of the connector "that" in English subordinate clauses. Building upon previous research, we extend our investigation to a larger corpus of written English, utilize contemporary large language models (LLMs) and extend the information-uniformity principles by the notion of entropy, to estimate the UID manifestations in the usecase of syntactic reduction choices.

* ACL2024 (main conference), 8 pages

Via

Access Paper or Ask Questions

The Knesset Corpus: An Annotated Corpus of Hebrew Parliamentary Proceedings

May 28, 2024

Gili Goldin, Nick Howell, Noam Ordan, Ella Rabinovich, Shuly Wintner

Abstract:We present the Knesset Corpus, a corpus of Hebrew parliamentary proceedings containing over 30 million sentences (over 384 million tokens) from all the (plenary and committee) protocols held in the Israeli parliament between 1998 and 2022. Sentences are annotated with morpho-syntactic information and are associated with detailed meta-information reflecting demographic and political properties of the speakers, based on a large database of parliament members and factions that we compiled. We discuss the structure and composition of the corpus and the various processing steps we applied to it. To demonstrate the utility of this novel dataset we present two use cases. We show that the corpus can be used to examine historical developments in the style of political discussions by showing a reduction in lexical richness in the proceedings over time. We also investigate some differences between the styles of men and women speakers. These use cases exemplify the potential of the corpus to shed light on important trends in the Israeli society, supporting research in linguistics, political science, communication, law, etc.

* 28 pages, 7 figures

Via

Access Paper or Ask Questions

Predicting Question-Answering Performance of Large Language Models through Semantic Consistency

Nov 02, 2023

Ella Rabinovich, Samuel Ackerman, Orna Raz, Eitan Farchi, Ateret Anaby-Tavor

Abstract:Semantic consistency of a language model is broadly defined as the model's ability to produce semantically-equivalent outputs, given semantically-equivalent inputs. We address the task of assessing question-answering (QA) semantic consistency of contemporary large language models (LLMs) by manually creating a benchmark dataset with high-quality paraphrases for factual questions, and release the dataset to the community. We further combine the semantic consistency metric with additional measurements suggested in prior work as correlating with LLM QA accuracy, for building and evaluating a framework for factual QA reference-less performance prediction -- predicting the likelihood of a language model to accurately answer a question. Evaluating the framework on five contemporary LLMs, we demonstrate encouraging, significantly outperforming baselines, results.

* EMNLP2023 GEM workshop, 17 pages

Via

Access Paper or Ask Questions

Reliable and Interpretable Drift Detection in Streams of Short Texts

May 28, 2023

Ella Rabinovich, Matan Vetzler, Samuel Ackerman, Ateret Anaby-Tavor

Abstract:Data drift is the change in model input data that is one of the key factors leading to machine learning models performance degradation over time. Monitoring drift helps detecting these issues and preventing their harmful consequences. Meaningful drift interpretation is a fundamental step towards effective re-training of the model. In this study we propose an end-to-end framework for reliable model-agnostic change-point detection and interpretation in large task-oriented dialog systems, proven effective in multiple customer deployments. We evaluate our approach and demonstrate its benefits with a novel variant of intent classification training dataset, simulating customer requests to a dialog system. We make the data publicly available.

* ACL2023 industry track (9 pages)

Via

Access Paper or Ask Questions

Gaining Insights into Unrecognized User Utterances in Task-Oriented Dialog Systems

Apr 11, 2022

Ella Rabinovich, Matan Vetzler, David Boaz, Vineet Kumar, Gaurav Pandey, Ateret Anaby-Tavor

Figure 1 for Gaining Insights into Unrecognized User Utterances in Task-Oriented Dialog Systems

Figure 2 for Gaining Insights into Unrecognized User Utterances in Task-Oriented Dialog Systems

Figure 3 for Gaining Insights into Unrecognized User Utterances in Task-Oriented Dialog Systems

Figure 4 for Gaining Insights into Unrecognized User Utterances in Task-Oriented Dialog Systems

Abstract:The rapidly growing market demand for dialogue agents capable of goal-oriented behavior has caused many tech-industry leaders to invest considerable efforts into task-oriented dialog systems. The performance and success of these systems is highly dependent on the accuracy of their intent identification -- the process of deducing the goal or meaning of the user's request and mapping it to one of the known intents for further processing. Gaining insights into unrecognized utterances -- user requests the systems fails to attribute to a known intent -- is therefore a key process in continuous improvement of goal-oriented dialog systems. We present an end-to-end pipeline for processing unrecognized user utterances, including a specifically-tailored clustering algorithm, a novel approach to cluster representative extraction, and cluster naming. We evaluated the proposed clustering algorithm and compared its performance to out-of-the-box SOTA solutions, demonstrating its benefits in the analysis of unrecognized user requests.

Via

Access Paper or Ask Questions

We've had this conversation before: A Novel Approach to Measuring Dialog Similarity

Oct 12, 2021

Ofer Lavi, Ella Rabinovich, Segev Shlomov, David Boaz, Inbal Ronen, Ateret Anaby-Tavor

Figure 1 for We've had this conversation before: A Novel Approach to Measuring Dialog Similarity

Figure 2 for We've had this conversation before: A Novel Approach to Measuring Dialog Similarity

Figure 3 for We've had this conversation before: A Novel Approach to Measuring Dialog Similarity

Figure 4 for We've had this conversation before: A Novel Approach to Measuring Dialog Similarity

Abstract:Dialog is a core building block of human natural language interactions. It contains multi-party utterances used to convey information from one party to another in a dynamic and evolving manner. The ability to compare dialogs is beneficial in many real world use cases, such as conversation analytics for contact center calls and virtual agent design. We propose a novel adaptation of the edit distance metric to the scenario of dialog similarity. Our approach takes into account various conversation aspects such as utterance semantics, conversation flow, and the participants. We evaluate this new approach and compare it to existing document similarity measures on two publicly available datasets. The results demonstrate that our method outperforms the other approaches in capturing dialog flow, and is better aligned with the human perception of conversation similarity.

* EMNLP 2021, 9 pages

Via

Access Paper or Ask Questions

Quantifying Cognitive Factors in Lexical Decline

Oct 12, 2021

David Francis, Ella Rabinovich, Farhan Samir, David Mortensen, Suzanne Stevenson

Figure 1 for Quantifying Cognitive Factors in Lexical Decline

Figure 2 for Quantifying Cognitive Factors in Lexical Decline

Figure 3 for Quantifying Cognitive Factors in Lexical Decline

Figure 4 for Quantifying Cognitive Factors in Lexical Decline

Abstract:We adopt an evolutionary view on language change in which cognitive factors (in addition to social ones) affect the fitness of words and their success in the linguistic ecosystem. Specifically, we propose a variety of psycholinguistic factors -- semantic, distributional, and phonological -- that we hypothesize are predictive of lexical decline, in which words greatly decrease in frequency over time. Using historical data across three languages (English, French, and German), we find that most of our proposed factors show a significant difference in the expected direction between each curated set of declining words and their matched stable words. Moreover, logistic regression analyses show that semantic and distributional factors are significant in predicting declining words. Further diachronic analysis reveals that declining words tend to decrease in the diversity of their lexical contexts over time, gradually narrowing their 'ecological niches'.

* Transactions of the Association for Computational Linguistics (TACL) 2021, 16 pages

Via

Access Paper or Ask Questions