Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yara Rizk

NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls

Sep 04, 2024

Kinjal Basu, Ibrahim Abdelaziz, Kelsey Bradford, Maxwell Crouse, Kiran Kate, Sadhana Kumaravel, Saurabh Goyal, Asim Munawar, Yara Rizk, Xin Wang(+2 more)

Abstract:Autonomous agent applications powered by large language models (LLMs) have recently risen to prominence as effective tools for addressing complex real-world tasks. At their core, agentic workflows rely on LLMs to plan and execute the use of tools and external Application Programming Interfaces (APIs) in sequence to arrive at the answer to a user's request. Various benchmarks and leaderboards have emerged to evaluate an LLM's capabilities for tool and API use; however, most of these evaluations only track single or multiple isolated API calling capabilities. In this paper, we present NESTFUL, a benchmark to evaluate LLMs on nested sequences of API calls, i.e., sequences where the output of one API call is passed as input to a subsequent call. NESTFUL has a total of 300 human annotated samples divided into two types - executable and non-executable. The executable samples are curated manually by crawling Rapid-APIs whereas the non-executable samples are hand picked by human annotators from data synthetically generated using an LLM. We evaluate state-of-the-art LLMs with function calling abilities on NESTFUL. Our results show that most models do not perform well on nested APIs in NESTFUL as compared to their performance on the simpler problem settings available in existing benchmarks.

Via

Access Paper or Ask Questions

Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks

Jun 27, 2024

Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal, Sadhana Kumaravel, Matthew Stallone, Rameswar Panda, Yara Rizk, GP Bhargav, Maxwell Crouse, Chulaka Gunasekara(+16 more)

Figure 1 for Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks

Figure 2 for Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks

Figure 3 for Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks

Figure 4 for Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks

Abstract:Large language models (LLMs) have recently shown tremendous promise in serving as the backbone to agentic systems, as demonstrated by their performance in multi-faceted, challenging benchmarks like SWE-Bench and Agent-Bench. However, to realize the true potential of LLMs as autonomous agents, they must learn to identify, call, and interact with external tools and application program interfaces (APIs) to complete complex tasks. These tasks together are termed function calling. Endowing LLMs with function calling abilities leads to a myriad of advantages, such as access to current and domain-specific information in databases and knowledge sources, and the ability to outsource tasks that can be reliably performed by tools, e.g., a Python interpreter or calculator. While there has been significant progress in function calling with LLMs, there is still a dearth of open models that perform on par with proprietary LLMs like GPT, Claude, and Gemini. Therefore, in this work, we introduce the GRANITE-20B-FUNCTIONCALLING model under an Apache 2.0 license. The model is trained using a multi-task training approach on seven fundamental tasks encompassed in function calling, those being Nested Function Calling, Function Chaining, Parallel Functions, Function Name Detection, Parameter-Value Pair Detection, Next-Best Function, and Response Generation. We present a comprehensive evaluation on multiple out-of-domain datasets comparing GRANITE-20B-FUNCTIONCALLING to more than 15 other best proprietary and open models. GRANITE-20B-FUNCTIONCALLING provides the best performance among all open models on the Berkeley Function Calling Leaderboard and fourth overall. As a result of the diverse tasks and datasets used for training our model, we show that GRANITE-20B-FUNCTIONCALLING has better generalizability on multiple tasks in seven different evaluation datasets.

Via

Access Paper or Ask Questions

TESS: A Multi-intent Parser for Conversational Multi-Agent Systems with Decentralized Natural Language Understanding Models

Dec 19, 2023

Burak Aksar, Yara Rizk, Tathagata Chakraborti

Abstract:Chatbots have become one of the main pathways for the delivery of business automation tools. Multi-agent systems offer a framework for designing chatbots at scale, making it easier to support complex conversations that span across multiple domains as well as enabling developers to maintain and expand their capabilities incrementally over time. However, multi-agent systems complicate the natural language understanding (NLU) of user intents, especially when they rely on decentralized NLU models: some utterances (termed single intent) may invoke a single agent while others (termed multi-intent) may explicitly invoke multiple agents. Without correctly parsing multi-intent inputs, decentralized NLU approaches will not achieve high prediction accuracy. In this paper, we propose an efficient parsing and orchestration pipeline algorithm to service multi-intent utterances from the user in the context of a multi-agent system. Our proposed approach achieved comparable performance to competitive deep learning models on three different datasets while being up to 48 times faster.

* 16 pages

Via

Access Paper or Ask Questions

TaskDiff: A Similarity Metric for Task-Oriented Conversations

Oct 25, 2023

Ankita Bhaumik, Praveen Venkateswaran, Yara Rizk, Vatche Isahagian

Abstract:The popularity of conversational digital assistants has resulted in the availability of large amounts of conversational data which can be utilized for improved user experience and personalized response generation. Building these assistants using popular large language models like ChatGPT also require additional emphasis on prompt engineering and evaluation methods. Textual similarity metrics are a key ingredient for such analysis and evaluations. While many similarity metrics have been proposed in the literature, they have not proven effective for task-oriented conversations as they do not take advantage of unique conversational features. To address this gap, we present TaskDiff, a novel conversational similarity metric that utilizes different dialogue components (utterances, intents, and slots) and their distributions to compute similarity. Extensive experimental evaluation of TaskDiff on a benchmark dataset demonstrates its superior performance and improved robustness over other related approaches.

* Accepted to the main conference at EMNLP 2023

Via

Access Paper or Ask Questions

A Case for Business Process-Specific Foundation Models

Oct 26, 2022

Yara Rizk, Praveen Venkateswaran, Vatche Isahagian, Vinod Muthusamy

Abstract:The inception of large language models has helped advance state-of-the-art performance on numerous natural language tasks. This has also opened the door for the development of foundation models for other domains and data modalities such as images, code, and music. In this paper, we argue that business process data representations have unique characteristics that warrant the development of a new class of foundation models to handle tasks like process mining, optimization, and decision making. These models should also tackle the unique challenges of applying AI to business processes which include data scarcity, multi-modal representations, domain specific terminology, and privacy concerns.

Via

Access Paper or Ask Questions

Extending LIME for Business Process Automation

Aug 09, 2021

Sohini Upadhyay, Vatche Isahagian, Vinod Muthusamy, Yara Rizk

Figure 1 for Extending LIME for Business Process Automation

Figure 2 for Extending LIME for Business Process Automation

Figure 3 for Extending LIME for Business Process Automation

Figure 4 for Extending LIME for Business Process Automation

Abstract:AI business process applications automate high-stakes business decisions where there is an increasing demand to justify or explain the rationale behind algorithmic decisions. Business process applications have ordering or constraints on tasks and feature values that cause lightweight, model-agnostic, existing explanation methods like LIME to fail. In response, we propose a local explanation framework extending LIME for explaining AI business process applications. Empirical evaluation of our extension underscores the advantage of our approach in the business process setting.

Via

Access Paper or Ask Questions

Explainable Composition of Aggregated Assistants

Nov 21, 2020

Sarath Sreedharan, Tathagata Chakraborti, Yara Rizk, Yasaman Khazaeni

Figure 1 for Explainable Composition of Aggregated Assistants

Figure 2 for Explainable Composition of Aggregated Assistants

Figure 3 for Explainable Composition of Aggregated Assistants

Figure 4 for Explainable Composition of Aggregated Assistants

Abstract:A new design of an AI assistant that has become increasingly popular is that of an "aggregated assistant" -- realized as an orchestrated composition of several individual skills or agents that can each perform atomic tasks. In this paper, we will talk about the role of planning in the automated composition of such assistants and explore how concepts in automated planning can help to establish transparency of the inner workings of the assistant to the end-user.

Via

Access Paper or Ask Questions

From Robotic Process Automation to Intelligent Process Automation: Emerging Trends

Jul 27, 2020

Tathagata Chakraborti, Vatche Isahagian, Rania Khalaf, Yasaman Khazaeni, Vinod Muthusamy, Yara Rizk, Merve Unuvar

Figure 1 for From Robotic Process Automation to Intelligent Process Automation: Emerging Trends

Figure 2 for From Robotic Process Automation to Intelligent Process Automation: Emerging Trends

Figure 3 for From Robotic Process Automation to Intelligent Process Automation: Emerging Trends

Abstract:In this survey, we study how recent advances in machine intelligence are disrupting the world of business processes. Over the last decade, there has been steady progress towards the automation of business processes under the umbrella of ``robotic process automation'' (RPA). However, we are currently at an inflection point in this evolution, as a new paradigm called ``Intelligent Process Automation'' (IPA) emerges, bringing machine learning (ML) and artificial intelligence (AI) technologies to bear in order to improve business process outcomes. The purpose of this paper is to provide a survey of this emerging theme and identify key open research challenges at the intersection of AI and business processes. We hope that this emerging theme will spark engaging conversations at the RPA Forum.

* Internation Conference on Business Process Management 2020 RPA Forum

Via

Access Paper or Ask Questions

A Conversational Digital Assistant for Intelligent Process Automation

Jul 27, 2020

Yara Rizk, Vatche Isahagian, Scott Boag, Yasaman Khazaeni, Merve Unuvar, Vinod Muthusamy, Rania Khalaf

Figure 1 for A Conversational Digital Assistant for Intelligent Process Automation

Figure 2 for A Conversational Digital Assistant for Intelligent Process Automation

Figure 3 for A Conversational Digital Assistant for Intelligent Process Automation

Figure 4 for A Conversational Digital Assistant for Intelligent Process Automation

Abstract:Robotic process automation (RPA) has emerged as the leading approach to automate tasks in business processes. Moving away from back-end automation, RPA automated the mouse-click on user interfaces; this outside-in approach reduced the overhead of updating legacy software. However, its many shortcomings, namely its lack of accessibility to business users, have prevented its widespread adoption in highly regulated industries. In this work, we explore interactive automation in the form of a conversational digital assistant. It allows business users to interact with and customize their automation solutions through natural language. The framework, which creates such assistants, relies on a multi-agent orchestration model and conversational wrappers for autonomous agents including RPAs. We demonstrate the effectiveness of our proposed approach on a loan approval business process and a travel preapproval business process.

* International Conference on Business Process Management 2020 RPA Forum

Via

Access Paper or Ask Questions

A Snooze-less User-Aware Notification System for Proactive Conversational Agents

Mar 04, 2020

Yara Rizk, Vatche Isahagian, Merve Unuvar, Yasaman Khazaeni

Figure 1 for A Snooze-less User-Aware Notification System for Proactive Conversational Agents

Abstract:The ubiquity of smart phones and electronic devices has placed a wealth of information at the fingertips of consumers as well as creators of digital content. This has led to millions of notifications being issued each second from alerts about posted YouTube videos to tweets, emails and personal messages. Adding work related notifications and we can see how quickly the number of notifications increases. Not only does this cause reduced productivity and concentration but has also been shown to cause alert fatigue. This condition makes users desensitized to notifications, causing them to ignore or miss important alerts. Depending on what domain users work in, the cost of missing a notification can vary from a mere inconvenience to life and death. Therefore, in this work, we propose an alert and notification framework that intelligently issues, suppresses and aggregates notifications, based on event severity, user preferences, or schedules, to minimize the need for users to ignore, or snooze their notifications and potentially forget about addressing important ones. Our framework can be deployed as a backend service, but is better suited to be integrated into proactive conversational agents, a field receiving a lot of attention with the digital transformation era, email services, news services and others. However, the main challenge lies in developing the right machine learning algorithms that can learn models from a wide set of users while customizing these models to individual users' preferences.

Via

Access Paper or Ask Questions