Abstract:Large language models (LLMs) have recently shown tremendous promise in serving as the backbone to agentic systems, as demonstrated by their performance in multi-faceted, challenging benchmarks like SWE-Bench and Agent-Bench. However, to realize the true potential of LLMs as autonomous agents, they must learn to identify, call, and interact with external tools and application program interfaces (APIs) to complete complex tasks. These tasks together are termed function calling. Endowing LLMs with function calling abilities leads to a myriad of advantages, such as access to current and domain-specific information in databases and knowledge sources, and the ability to outsource tasks that can be reliably performed by tools, e.g., a Python interpreter or calculator. While there has been significant progress in function calling with LLMs, there is still a dearth of open models that perform on par with proprietary LLMs like GPT, Claude, and Gemini. Therefore, in this work, we introduce the GRANITE-20B-FUNCTIONCALLING model under an Apache 2.0 license. The model is trained using a multi-task training approach on seven fundamental tasks encompassed in function calling, those being Nested Function Calling, Function Chaining, Parallel Functions, Function Name Detection, Parameter-Value Pair Detection, Next-Best Function, and Response Generation. We present a comprehensive evaluation on multiple out-of-domain datasets comparing GRANITE-20B-FUNCTIONCALLING to more than 15 other best proprietary and open models. GRANITE-20B-FUNCTIONCALLING provides the best performance among all open models on the Berkeley Function Calling Leaderboard and fourth overall. As a result of the diverse tasks and datasets used for training our model, we show that GRANITE-20B-FUNCTIONCALLING has better generalizability on multiple tasks in seven different evaluation datasets.
Abstract:Robotic process automation (RPA) and its next evolutionary stage, intelligent process automation, promise to drive improvements in efficiencies and process outcomes. However, how can business leaders evaluate how to integrate intelligent automation into business processes? What is an appropriate division of labor between humans and machines? How should combined human-AI teams be evaluated? For RPA, often the human labor cost and the robotic labor cost are directly compared to make an automation decision. In this position paper, we argue for a broader view that incorporates the potential for multiple levels of autonomy and human involvement, as well as a wider range of metrics beyond productivity when integrating digital workers into a business process
Abstract:In this survey, we study how recent advances in machine intelligence are disrupting the world of business processes. Over the last decade, there has been steady progress towards the automation of business processes under the umbrella of ``robotic process automation'' (RPA). However, we are currently at an inflection point in this evolution, as a new paradigm called ``Intelligent Process Automation'' (IPA) emerges, bringing machine learning (ML) and artificial intelligence (AI) technologies to bear in order to improve business process outcomes. The purpose of this paper is to provide a survey of this emerging theme and identify key open research challenges at the intersection of AI and business processes. We hope that this emerging theme will spark engaging conversations at the RPA Forum.
Abstract:Robotic process automation (RPA) has emerged as the leading approach to automate tasks in business processes. Moving away from back-end automation, RPA automated the mouse-click on user interfaces; this outside-in approach reduced the overhead of updating legacy software. However, its many shortcomings, namely its lack of accessibility to business users, have prevented its widespread adoption in highly regulated industries. In this work, we explore interactive automation in the form of a conversational digital assistant. It allows business users to interact with and customize their automation solutions through natural language. The framework, which creates such assistants, relies on a multi-agent orchestration model and conversational wrappers for autonomous agents including RPAs. We demonstrate the effectiveness of our proposed approach on a loan approval business process and a travel preapproval business process.
Abstract:The ubiquity of smart phones and electronic devices has placed a wealth of information at the fingertips of consumers as well as creators of digital content. This has led to millions of notifications being issued each second from alerts about posted YouTube videos to tweets, emails and personal messages. Adding work related notifications and we can see how quickly the number of notifications increases. Not only does this cause reduced productivity and concentration but has also been shown to cause alert fatigue. This condition makes users desensitized to notifications, causing them to ignore or miss important alerts. Depending on what domain users work in, the cost of missing a notification can vary from a mere inconvenience to life and death. Therefore, in this work, we propose an alert and notification framework that intelligently issues, suppresses and aggregates notifications, based on event severity, user preferences, or schedules, to minimize the need for users to ignore, or snooze their notifications and potentially forget about addressing important ones. Our framework can be deployed as a backend service, but is better suited to be integrated into proactive conversational agents, a field receiving a lot of attention with the digital transformation era, email services, news services and others. However, the main challenge lies in developing the right machine learning algorithms that can learn models from a wide set of users while customizing these models to individual users' preferences.
Abstract:Business process automation is a booming multi-billion-dollar industry that promises to remove menial tasks from workers' plates -- through the introduction of autonomous agents -- and free up their time and brain power for more creative and engaging tasks. However, an essential component to the successful deployment of such autonomous agents is the ability of business users to monitor their performance and customize their execution. A simple and user-friendly interface with a low learning curve is necessary to increase the adoption of such agents in banking, insurance, retail and other domains. As a result, proactive chatbots will play a crucial role in the business automation space. Not only can they respond to users' queries and perform actions on their behalf but also initiate communication with the users to inform them of the system's behavior. This will provide business users a natural language interface to interact with, monitor and control autonomous agents. In this work, we present a multi-agent orchestration framework to develop such proactive chatbots by discussing the types of skills that can be composed into agents and how to orchestrate these agents. Two use cases on a travel preapproval business process and a loan application business process are adopted to qualitatively analyze the proposed framework based on four criteria: performance, coding overhead, scalability, and agent overlap.