Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ateret Anaby-Tavor

On the Robustness of Agentic Function Calling

Apr 01, 2025

Ella Rabinovich, Ateret Anaby-Tavor

Abstract:Large Language Models (LLMs) are increasingly acting as autonomous agents, with function calling (FC) capabilities enabling them to invoke specific tools for tasks. While prior research has primarily focused on improving FC accuracy, little attention has been given to the robustness of these agents to perturbations in their input. We introduce a benchmark assessing FC robustness in two key areas: resilience to naturalistic query variations, and stability in function calling when the toolkit expands with semantically related tools. Evaluating best-performing FC models on a carefully expanded subset of the Berkeley function calling leaderboard (BFCL), we identify critical weaknesses in existing evaluation methodologies, and highlight areas for improvement in real-world agentic deployments.

* 7 pages, TrustNLP@NAACL25

Via

Access Paper or Ask Questions

Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In

Oct 22, 2024

Itay Nakash, George Kour, Guy Uziel, Ateret Anaby-Tavor

Figure 1 for Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In

Figure 2 for Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In

Figure 3 for Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In

Figure 4 for Breaking ReAct Agents: Foot-in-the-Door Attack Will Get You In

Abstract:Following the advancement of large language models (LLMs), the development of LLM-based autonomous agents has become increasingly prevalent. As a result, the need to understand the security vulnerabilities of these agents has become a critical task. We examine how ReAct agents can be exploited using a straightforward yet effective method we refer to as the foot-in-the-door attack. Our experiments show that indirect prompt injection attacks, prompted by harmless and unrelated requests (such as basic calculations) can significantly increase the likelihood of the agent performing subsequent malicious actions. Our results show that once a ReAct agents thought includes a specific tool or action, the likelihood of executing this tool in the subsequent steps increases significantly, as the agent seldom re-evaluates its actions. Consequently, even random, harmless requests can establish a foot-in-the-door, allowing an attacker to embed malicious instructions into the agents thought process, making it more susceptible to harmful directives. To mitigate this vulnerability, we propose implementing a simple reflection mechanism that prompts the agent to reassess the safety of its actions during execution, which can help reduce the success of such attacks.

Via

Access Paper or Ask Questions

Exploring Straightforward Conversational Red-Teaming

Sep 07, 2024

George Kour, Naama Zwerdling, Marcel Zalmanovici, Ateret Anaby-Tavor, Ora Nova Fandina, Eitan Farchi

Figure 1 for Exploring Straightforward Conversational Red-Teaming

Figure 2 for Exploring Straightforward Conversational Red-Teaming

Figure 3 for Exploring Straightforward Conversational Red-Teaming

Figure 4 for Exploring Straightforward Conversational Red-Teaming

Abstract:Large language models (LLMs) are increasingly used in business dialogue systems but they pose security and ethical risks. Multi-turn conversations, where context influences the model's behavior, can be exploited to produce undesired responses. In this paper, we examine the effectiveness of utilizing off-the-shelf LLMs in straightforward red-teaming approaches, where an attacker LLM aims to elicit undesired output from a target LLM, comparing both single-turn and conversational red-teaming tactics. Our experiments offer insights into various usage strategies that significantly affect their performance as red teamers. They suggest that off-the-shelf models can act as effective red teamers and even adjust their attack strategy based on past attempts, although their effectiveness decreases with greater alignment.

Via

Access Paper or Ask Questions

A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios

Aug 04, 2024

Samuel Ackerman, Ella Rabinovich, Eitan Farchi, Ateret Anaby-Tavor

Abstract:We evaluate the robustness of several large language models on multiple datasets. Robustness here refers to the relative insensitivity of the model's answers to meaning-preserving variants of their input. Benchmark datasets are constructed by introducing naturally-occurring, non-malicious perturbations, or by generating semantically equivalent paraphrases of input questions or statements. We further propose a novel metric for assessing a model robustness, and demonstrate its benefits in the non-adversarial scenario by empirical evaluation of several models on the created datasets.

Via

Access Paper or Ask Questions

From Zero to Hero: Cold-Start Anomaly Detection

May 30, 2024

Tal Reiss, George Kour, Naama Zwerdling, Ateret Anaby-Tavor, Yedid Hoshen

Figure 1 for From Zero to Hero: Cold-Start Anomaly Detection

Figure 2 for From Zero to Hero: Cold-Start Anomaly Detection

Figure 3 for From Zero to Hero: Cold-Start Anomaly Detection

Figure 4 for From Zero to Hero: Cold-Start Anomaly Detection

Abstract:When first deploying an anomaly detection system, e.g., to detect out-of-scope queries in chatbots, there are no observed data, making data-driven approaches ineffective. Zero-shot anomaly detection methods offer a solution to such "cold-start" cases, but unfortunately they are often not accurate enough. This paper studies the realistic but underexplored cold-start setting where an anomaly detection model is initialized using zero-shot guidance, but subsequently receives a small number of contaminated observations (namely, that may include anomalies). The goal is to make efficient use of both the zero-shot guidance and the observations. We propose ColdFusion, a method that effectively adapts the zero-shot anomaly detector to contaminated observations. To support future development of this new setting, we propose an evaluation suite consisting of evaluation protocols and metrics.

* ACL 2024. Our code is available at https://github.com/talreiss/ColdFusion

Via

Access Paper or Ask Questions

Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

Mar 09, 2024

Swapnaja Achintalwar, Adriana Alvarado Garcia, Ateret Anaby-Tavor, Ioana Baldini, Sara E. Berger, Bishwaranjan Bhattacharjee, Djallel Bouneffouf, Subhajit Chaudhury, Pin-Yu Chen, Lamogha Chiazor(+25 more)

Figure 1 for Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

Figure 2 for Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

Figure 3 for Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

Figure 4 for Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

Abstract:Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we present our ongoing efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms. In addition to the detectors themselves, we discuss a wide range of uses for these detector models - from acting as guardrails to enabling effective AI governance. We also deep dive into inherent challenges in their development and discuss future work aimed at making the detectors more reliable and broadening their scope.

Via

Access Paper or Ask Questions

SpeCrawler: Generating OpenAPI Specifications from API Documentation Using Large Language Models

Feb 18, 2024

Koren Lazar, Matan Vetzler, Guy Uziel, David Boaz, Esther Goldbraich, David Amid, Ateret Anaby-Tavor

Abstract:In the digital era, the widespread use of APIs is evident. However, scalable utilization of APIs poses a challenge due to structure divergence observed in online API documentation. This underscores the need for automatic tools to facilitate API consumption. A viable approach involves the conversion of documentation into an API Specification format. While previous attempts have been made using rule-based methods, these approaches encountered difficulties in generalizing across diverse documentation. In this paper we introduce SpeCrawler, a comprehensive system that utilizes large language models (LLMs) to generate OpenAPI Specifications from diverse API documentation through a carefully crafted pipeline. By creating a standardized format for numerous APIs, SpeCrawler aids in streamlining integration processes within API orchestrating systems and facilitating the incorporation of tools into LLMs. The paper explores SpeCrawler's methodology, supported by empirical evidence and case studies, demonstrating its efficacy through LLM capabilities.

* Under Review for KDD 2024

Via

Access Paper or Ask Questions

What's the Plan? Evaluating and Developing Planning-Aware Techniques for LLMs

Feb 18, 2024

Eran Hirsch, Guy Uziel, Ateret Anaby-Tavor

Abstract:Planning is a fundamental task in artificial intelligence that involves finding a sequence of actions that achieve a specified goal in a given environment. Large language models (LLMs) are increasingly used for applications that require planning capabilities, such as web or embodied agents. In line with recent studies, we demonstrate through experimentation that LLMs lack necessary skills required for planning. Based on these observations, we advocate for the potential of a hybrid approach that combines LLMs with classical planning methodology. Then, we introduce SimPlan, a novel hybrid-method, and evaluate its performance in a new challenging setup. Our extensive experiments across various planning domains demonstrate that SimPlan significantly outperforms existing LLM-based planners.

* 8 pages and an appendix

Via

Access Paper or Ask Questions

Unveiling Safety Vulnerabilities of Large Language Models

Nov 07, 2023

George Kour, Marcel Zalmanovici, Naama Zwerdling, Esther Goldbraich, Ora Nova Fandina, Ateret Anaby-Tavor, Orna Raz, Eitan Farchi

Figure 1 for Unveiling Safety Vulnerabilities of Large Language Models

Figure 2 for Unveiling Safety Vulnerabilities of Large Language Models

Figure 3 for Unveiling Safety Vulnerabilities of Large Language Models

Figure 4 for Unveiling Safety Vulnerabilities of Large Language Models

Abstract:As large language models become more prevalent, their possible harmful or inappropriate responses are a cause for concern. This paper introduces a unique dataset containing adversarial examples in the form of questions, which we call AttaQ, designed to provoke such harmful or inappropriate responses. We assess the efficacy of our dataset by analyzing the vulnerabilities of various models when subjected to it. Additionally, we introduce a novel automatic approach for identifying and naming vulnerable semantic regions - input semantic areas for which the model is likely to produce harmful outputs. This is achieved through the application of specialized clustering techniques that consider both the semantic similarity of the input attacks and the harmfulness of the model's responses. Automatically identifying vulnerable semantic regions enhances the evaluation of model weaknesses, facilitating targeted improvements to its safety mechanisms and overall reliability.

* To be published in GEM workshop. Conference on Empirical Methods in Natural Language Processing (EMNLP). 2023

Via

Access Paper or Ask Questions

Predicting Question-Answering Performance of Large Language Models through Semantic Consistency

Nov 02, 2023

Ella Rabinovich, Samuel Ackerman, Orna Raz, Eitan Farchi, Ateret Anaby-Tavor

Abstract:Semantic consistency of a language model is broadly defined as the model's ability to produce semantically-equivalent outputs, given semantically-equivalent inputs. We address the task of assessing question-answering (QA) semantic consistency of contemporary large language models (LLMs) by manually creating a benchmark dataset with high-quality paraphrases for factual questions, and release the dataset to the community. We further combine the semantic consistency metric with additional measurements suggested in prior work as correlating with LLM QA accuracy, for building and evaluating a framework for factual QA reference-less performance prediction -- predicting the likelihood of a language model to accurately answer a question. Evaluating the framework on five contemporary LLMs, we demonstrate encouraging, significantly outperforming baselines, results.

* EMNLP2023 GEM workshop, 17 pages

Via

Access Paper or Ask Questions