Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shreya Rajpal

Universität Hamburg, Hamburg, Germany, Vellore Institute of Technology, Vellore, Tamil Nadu, India

LExT: Towards Evaluating Trustworthiness of Natural Language Explanations

Apr 08, 2025

Krithi Shailya, Shreya Rajpal, Gokul S Krishnan, Balaraman Ravindran

Abstract:As Large Language Models (LLMs) become increasingly integrated into high-stakes domains, there have been several approaches proposed toward generating natural language explanations. These explanations are crucial for enhancing the interpretability of a model, especially in sensitive domains like healthcare, where transparency and reliability are key. In light of such explanations being generated by LLMs and its known concerns, there is a growing need for robust evaluation frameworks to assess model-generated explanations. Natural Language Generation metrics like BLEU and ROUGE capture syntactic and semantic accuracies but overlook other crucial aspects such as factual accuracy, consistency, and faithfulness. To address this gap, we propose a general framework for quantifying trustworthiness of natural language explanations, balancing Plausibility and Faithfulness, to derive a comprehensive Language Explanation Trustworthiness Score (LExT) (The code and set up to reproduce our experiments are publicly available at https://github.com/cerai-iitm/LExT). Applying our domain-agnostic framework to the healthcare domain using public medical datasets, we evaluate six models, including domain-specific and general-purpose models. Our findings demonstrate significant differences in their ability to generate trustworthy explanations. On comparing these explanations, we make interesting observations such as inconsistencies in Faithfulness demonstrated by general-purpose models and their tendency to outperform domain-specific fine-tuned models. This work further highlights the importance of using a tailored evaluation framework to assess natural language explanations in sensitive fields, providing a foundation for improving the trustworthiness and transparency of language models in healthcare and beyond.

Via

Access Paper or Ask Questions

BERTologyNavigator: Advanced Question Answering with BERT-based Semantics

Jan 17, 2024

Shreya Rajpal, Ricardo Usbeck

Abstract:The development and integration of knowledge graphs and language models has significance in artificial intelligence and natural language processing. In this study, we introduce the BERTologyNavigator -- a two-phased system that combines relation extraction techniques and BERT embeddings to navigate the relationships within the DBLP Knowledge Graph (KG). Our approach focuses on extracting one-hop relations and labelled candidate pairs in the first phases. This is followed by employing BERT's CLS embeddings and additional heuristics for relation selection in the second phase. Our system reaches an F1 score of 0.2175 on the DBLP QuAD Final test dataset for Scholarly QALD and 0.98 F1 score on the subset of the DBLP QuAD test dataset during the QA phase.

* Joint Proceedings of Scholarly QALD 2023 and SemREC 2023 co-located with 22nd International Semantic Web Conference ISWC 2023. Athens, Greece, November 6-10, 2023
* Accepted in Scholarly QALD Challenge @ ISWC 2023

Via

Access Paper or Ask Questions

Learning Type-Aware Embeddings for Fashion Compatibility

Jul 27, 2018

Mariya I. Vasileva, Bryan A. Plummer, Krishna Dusad, Shreya Rajpal, Ranjitha Kumar, David Forsyth

Figure 1 for Learning Type-Aware Embeddings for Fashion Compatibility

Figure 2 for Learning Type-Aware Embeddings for Fashion Compatibility

Figure 3 for Learning Type-Aware Embeddings for Fashion Compatibility

Figure 4 for Learning Type-Aware Embeddings for Fashion Compatibility

Abstract:Outfits in online fashion data are composed of items of many different types (e.g. top, bottom, shoes) that share some stylistic relationship with one another. A representation for building outfits requires a method that can learn both notions of similarity (for example, when two tops are interchangeable) and compatibility (items of possibly different type that can go together in an outfit). This paper presents an approach to learning an image embedding that respects item type, and jointly learns notions of item similarity and compatibility in an end-to-end model. To evaluate the learned representation, we crawled 68,306 outfits created by users on the Polyvore website. Our approach obtains 3-5% improvement over the state-of-the-art on outfit compatibility prediction and fill-in-the-blank tasks using our dataset, as well as an established smaller dataset, while supporting a variety of useful queries.

* Accepted at ECCV 2018

Via

Access Paper or Ask Questions

Octopus: A Framework for Cost-Quality-Time Optimization in Crowdsourcing

Aug 15, 2017

Karan Goel, Shreya Rajpal, Mausam

Figure 1 for Octopus: A Framework for Cost-Quality-Time Optimization in Crowdsourcing

Figure 2 for Octopus: A Framework for Cost-Quality-Time Optimization in Crowdsourcing

Figure 3 for Octopus: A Framework for Cost-Quality-Time Optimization in Crowdsourcing

Figure 4 for Octopus: A Framework for Cost-Quality-Time Optimization in Crowdsourcing

Abstract:We present Octopus, an AI agent to jointly balance three conflicting task objectives on a micro-crowdsourcing marketplace - the quality of work, total cost incurred, and time to completion. Previous control agents have mostly focused on cost-quality, or cost-time tradeoffs, but not on directly controlling all three in concert. A naive formulation of three-objective optimization is intractable; Octopus takes a hierarchical POMDP approach, with three different components responsible for setting the pay per task, selecting the next task, and controlling task-level quality. We demonstrate that Octopus significantly outperforms existing state-of-the-art approaches on real experiments. We also deploy Octopus on Amazon Mechanical Turk, showing its ability to manage tasks in a real-world dynamic setting.

* 10 pages, to appear in HCOMP 2017

Via

Access Paper or Ask Questions