Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sameep Mehta

A Framework for Testing and Adapting REST APIs as LLM Tools

Apr 22, 2025

Jayachandu Bandlamudi, Ritwik Chaudhuri, Neelamadhav Gantayat, Kushal Mukherjee, Prerna Agarwal, Renuka Sindhgatta, Sameep Mehta

Abstract:Large Language Models (LLMs) are enabling autonomous agents to perform complex workflows using external tools or functions, often provided via REST APIs in enterprise systems. However, directly utilizing these APIs as tools poses challenges due to their complex input schemas, elaborate responses, and often ambiguous documentation. Current benchmarks for tool testing do not adequately address these complexities, leading to a critical gap in evaluating API readiness for agent-driven automation. In this work, we present a novel testing framework aimed at evaluating and enhancing the readiness of REST APIs to function as tools for LLM-based agents. Our framework transforms apis as tools, generates comprehensive test cases for the APIs, translates tests cases into natural language instructions suitable for agents, enriches tool definitions and evaluates the agent's ability t correctly invoke the API and process its inputs and responses. To provide actionable insights, we analyze the outcomes of 750 test cases, presenting a detailed taxonomy of errors, including input misinterpretation, output handling inconsistencies, and schema mismatches. Additionally, we classify these test cases to streamline debugging and refinement of tool integrations. This work offers a foundational step toward enabling enterprise APIs as tools, improving their usability in agent-based applications.

Via

Access Paper or Ask Questions

xLP: Explainable Link Prediction for Master Data Management

Mar 14, 2024

Balaji Ganesan, Matheen Ahmed Pasha, Srinivasa Parkala, Neeraj R Singh, Gayatri Mishra, Sumit Bhatia, Hima Patel, Somashekar Naganna, Sameep Mehta

Figure 1 for xLP: Explainable Link Prediction for Master Data Management

Figure 2 for xLP: Explainable Link Prediction for Master Data Management

Figure 3 for xLP: Explainable Link Prediction for Master Data Management

Figure 4 for xLP: Explainable Link Prediction for Master Data Management

Abstract:Explaining neural model predictions to users requires creativity. Especially in enterprise applications, where there are costs associated with users' time, and their trust in the model predictions is critical for adoption. For link prediction in master data management, we have built a number of explainability solutions drawing from research in interpretability, fact verification, path ranking, neuro-symbolic reasoning and self-explaining AI. In this demo, we present explanations for link prediction in a creative way, to allow users to choose explanations they are more comfortable with.

* 8 pages, 4 figures, NeurIPS 2020 Competition and Demonstration Track. arXiv admin note: text overlap with arXiv:2012.05516

Via

Access Paper or Ask Questions

LLMGuard: Guarding Against Unsafe LLM Behavior

Feb 27, 2024

Shubh Goyal, Medha Hira, Shubham Mishra, Sukriti Goyal, Arnav Goel, Niharika Dadu, Kirushikesh DB, Sameep Mehta, Nishtha Madaan

Figure 1 for LLMGuard: Guarding Against Unsafe LLM Behavior

Figure 2 for LLMGuard: Guarding Against Unsafe LLM Behavior

Abstract:Although the rise of Large Language Models (LLMs) in enterprise settings brings new opportunities and capabilities, it also brings challenges, such as the risk of generating inappropriate, biased, or misleading content that violates regulations and can have legal concerns. To alleviate this, we present "LLMGuard", a tool that monitors user interactions with an LLM application and flags content against specific behaviours or conversation topics. To do this robustly, LLMGuard employs an ensemble of detectors.

* accepted in demonstration track of AAAI-24

Via

Access Paper or Ask Questions

"Beware of deception": Detecting Half-Truth and Debunking it through Controlled Claim Editing

Aug 15, 2023

Sandeep Singamsetty, Nishtha Madaan, Sameep Mehta, Varad Bhatnagar, Pushpak Bhattacharyya

Figure 1 for "Beware of deception": Detecting Half-Truth and Debunking it through Controlled Claim Editing

Figure 2 for "Beware of deception": Detecting Half-Truth and Debunking it through Controlled Claim Editing

Figure 3 for "Beware of deception": Detecting Half-Truth and Debunking it through Controlled Claim Editing

Figure 4 for "Beware of deception": Detecting Half-Truth and Debunking it through Controlled Claim Editing

Abstract:The prevalence of half-truths, which are statements containing some truth but that are ultimately deceptive, has risen with the increasing use of the internet. To help combat this problem, we have created a comprehensive pipeline consisting of a half-truth detection model and a claim editing model. Our approach utilizes the T5 model for controlled claim editing; "controlled" here means precise adjustments to select parts of a claim. Our methodology achieves an average BLEU score of 0.88 (on a scale of 0-1) and a disinfo-debunk score of 85% on edited claims. Significantly, our T5-based approach outperforms other Language Models such as GPT2, RoBERTa, PEGASUS, and Tailor, with average improvements of 82%, 57%, 42%, and 23% in disinfo-debunk scores, respectively. By extending the LIAR PLUS dataset, we achieve an F1 score of 82% for the half-truth detection model, setting a new benchmark in the field. While previous attempts have been made at half-truth detection, our approach is, to the best of our knowledge, the first to attempt to debunk half-truths.

Via

Access Paper or Ask Questions

CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation

Jun 01, 2023

Rahul Madhavan, Rishabh Garg, Kahini Wadhawan, Sameep Mehta

Figure 1 for CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation

Figure 2 for CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation

Figure 3 for CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation

Figure 4 for CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation

Abstract:We propose a method to control the attributes of Language Models (LMs) for the text generation task using Causal Average Treatment Effect (ATE) scores and counterfactual augmentation. We explore this method, in the context of LM detoxification, and propose the Causally Fair Language (CFL) architecture for detoxifying pre-trained LMs in a plug-and-play manner. Our architecture is based on a Structural Causal Model (SCM) that is mathematically transparent and computationally efficient as compared with many existing detoxification techniques. We also propose several new metrics that aim to better understand the behaviour of LMs in the context of toxic text generation. Further, we achieve state of the art performance for toxic degeneration, which are computed using \RTP (RTP) benchmark. Our experiments show that CFL achieves such a detoxification without much impact on the model perplexity. We also show that CFL mitigates the unintended bias problem through experiments on the BOLD dataset.

* 19 pages, 10 figures. Findings of ACL 2023

Via

Access Paper or Ask Questions

Data Quality Toolkit: Automatic assessment of data quality and remediation for machine learning datasets

Sep 05, 2021

Nitin Gupta, Hima Patel, Shazia Afzal, Naveen Panwar, Ruhi Sharma Mittal, Shanmukha Guttula, Abhinav Jain, Lokesh Nagalapatti, Sameep Mehta, Sandeep Hans(+3 more)

Figure 1 for Data Quality Toolkit: Automatic assessment of data quality and remediation for machine learning datasets

Figure 2 for Data Quality Toolkit: Automatic assessment of data quality and remediation for machine learning datasets

Figure 3 for Data Quality Toolkit: Automatic assessment of data quality and remediation for machine learning datasets

Figure 4 for Data Quality Toolkit: Automatic assessment of data quality and remediation for machine learning datasets

Abstract:The quality of training data has a huge impact on the efficiency, accuracy and complexity of machine learning tasks. Various tools and techniques are available that assess data quality with respect to general cleaning and profiling checks. However these techniques are not applicable to detect data issues in the context of machine learning tasks, like noisy labels, existence of overlapping classes etc. We attempt to re-look at the data quality issues in the context of building a machine learning pipeline and build a tool that can detect, explain and remediate issues in the data, and systematically and automatically capture all the changes applied to the data. We introduce the Data Quality Toolkit for machine learning as a library of some key quality metrics and relevant remediation techniques to analyze and enhance the readiness of structured training datasets for machine learning projects. The toolkit can reduce the turn-around times of data preparation pipelines and streamline the data quality assessment process. Our toolkit is publicly available via IBM API Hub [1] platform, any developer can assess the data quality using the IBM's Data Quality for AI apis [2]. Detailed tutorials are also available on IBM Learning Path [3].

Via

Access Paper or Ask Questions

Explainable Link Prediction for Privacy-Preserving Contact Tracing

Dec 10, 2020

Balaji Ganesan, Hima Patel, Sameep Mehta

Figure 1 for Explainable Link Prediction for Privacy-Preserving Contact Tracing

Figure 2 for Explainable Link Prediction for Privacy-Preserving Contact Tracing

Figure 3 for Explainable Link Prediction for Privacy-Preserving Contact Tracing

Figure 4 for Explainable Link Prediction for Privacy-Preserving Contact Tracing

Abstract:Contact Tracing has been used to identify people who were in close proximity to those infected with SARS-Cov2 coronavirus. A number of digital contract tracing applications have been introduced to facilitate or complement physical contact tracing. However, there are a number of privacy issues in the implementation of contract tracing applications, which make people reluctant to install or update their infection status on these applications. In this concept paper, we present ideas from Graph Neural Networks and explainability, that could improve trust in these applications, and encourage adoption by people.

* 8 pages, 7 figures, SpicyFL 2020 Workshop at NeurIPS 2020

Via

Access Paper or Ask Questions

Data Readiness Report

Oct 15, 2020

Shazia Afzal, Rajmohan C, Manish Kesarwani, Sameep Mehta, Hima Patel

Abstract:Data exploration and quality analysis is an important yet tedious process in the AI pipeline. Current practices of data cleaning and data readiness assessment for machine learning tasks are mostly conducted in an arbitrary manner which limits their reuse and results in loss of productivity. We introduce the concept of a Data Readiness Report as an accompanying documentation to a dataset that allows data consumers to get detailed insights into the quality of input data. Data characteristics and challenges on various quality dimensions are identified and documented keeping in mind the principles of transparency and explainability. The Data Readiness Report also serves as a record of all data assessment operations including applied transformations. This provides a detailed lineage for the purpose of data governance and management. In effect, the report captures and documents the actions taken by various personas in a data readiness and assessment workflow. Overtime this becomes a repository of best practices and can potentially drive a recommendation system for building automated data readiness workflows on the lines of AutoML [8]. We anticipate that together with the Datasheets [9], Dataset Nutrition Label [11], FactSheets [1] and Model Cards [15], the Data Readiness Report makes significant progress towards Data and AI lifecycle documentation.

Via

Access Paper or Ask Questions

Fair Transfer of Multiple Style Attributes in Text

Jan 18, 2020

Karan Dabas, Nishtha Madan, Vijay Arya, Sameep Mehta, Gautam Singh, Tanmoy Chakraborty

Figure 1 for Fair Transfer of Multiple Style Attributes in Text

Figure 2 for Fair Transfer of Multiple Style Attributes in Text

Figure 3 for Fair Transfer of Multiple Style Attributes in Text

Figure 4 for Fair Transfer of Multiple Style Attributes in Text

Abstract:To preserve anonymity and obfuscate their identity on online platforms users may morph their text and portray themselves as a different gender or demographic. Similarly, a chatbot may need to customize its communication style to improve engagement with its audience. This manner of changing the style of written text has gained significant attention in recent years. Yet these past research works largely cater to the transfer of single style attributes. The disadvantage of focusing on a single style alone is that this often results in target text where other existing style attributes behave unpredictably or are unfairly dominated by the new style. To counteract this behavior, it would be nice to have a style transfer mechanism that can transfer or control multiple styles simultaneously and fairly. Through such an approach, one could obtain obfuscated or written text incorporated with a desired degree of multiple soft styles such as female-quality, politeness, or formalness. In this work, we demonstrate that the transfer of multiple styles cannot be achieved by sequentially performing multiple single-style transfers. This is because each single style-transfer step often reverses or dominates over the style incorporated by a previous transfer step. We then propose a neural network architecture for fairly transferring multiple style attributes in a given text. We test our architecture on the Yelp data set to demonstrate our superior performance as compared to existing one-style transfer steps performed in a sequence.

Via

Access Paper or Ask Questions

Hardening Deep Neural Networks via Adversarial Model Cascades

Nov 04, 2018

Deepak Vijaykeerthy, Anshuman Suri, Sameep Mehta, Ponnurangam Kumaraguru

Figure 1 for Hardening Deep Neural Networks via Adversarial Model Cascades

Figure 2 for Hardening Deep Neural Networks via Adversarial Model Cascades

Figure 3 for Hardening Deep Neural Networks via Adversarial Model Cascades

Figure 4 for Hardening Deep Neural Networks via Adversarial Model Cascades

Abstract:Deep neural networks (DNNs) are vulnerable to malicious inputs crafted by an adversary to produce erroneous outputs. Works on securing neural networks against adversarial examples achieve high empirical robustness on simple datasets such as MNIST. However, these techniques are inadequate when empirically tested on complex data sets such as CIFAR-10 and SVHN. Further, existing techniques are designed to target specific attacks and fail to generalize across attacks. We propose the Adversarial Model Cascades (AMC) as a way to tackle the above inadequacies. Our approach trains a cascade of models sequentially where each model is optimized to be robust towards a mixture of multiple attacks. Ultimately, it yields a single model which is secure against a wide range of attacks; namely FGSM, Elastic, Virtual Adversarial Perturbations and Madry. On an average, AMC increases the model's empirical robustness against various attacks simultaneously, by a significant margin (of 6.225% for MNIST, 5.075% for SVHN and 2.65% for CIFAR10). At the same time, the model's performance on non-adversarial inputs is comparable to the state-of-the-art models.

Via

Access Paper or Ask Questions