Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kushan Mitra

FactLens: Benchmarking Fine-Grained Fact Verification

Nov 08, 2024

Kushan Mitra, Dan Zhang, Sajjadur Rahman, Estevam Hruschka

Figure 1 for FactLens: Benchmarking Fine-Grained Fact Verification

Figure 2 for FactLens: Benchmarking Fine-Grained Fact Verification

Figure 3 for FactLens: Benchmarking Fine-Grained Fact Verification

Figure 4 for FactLens: Benchmarking Fine-Grained Fact Verification

Abstract:Large Language Models (LLMs) have shown impressive capability in language generation and understanding, but their tendency to hallucinate and produce factually incorrect information remains a key limitation. To verify LLM-generated contents and claims from other sources, traditional verification approaches often rely on holistic models that assign a single factuality label to complex claims, potentially obscuring nuanced errors. In this paper, we advocate for a shift toward fine-grained verification, where complex claims are broken down into smaller sub-claims for individual verification, allowing for more precise identification of inaccuracies, improved transparency, and reduced ambiguity in evidence retrieval. However, generating sub-claims poses challenges, such as maintaining context and ensuring semantic equivalence with respect to the original claim. We introduce FactLens, a benchmark for evaluating fine-grained fact verification, with metrics and automated evaluators of sub-claim quality. The benchmark data is manually curated to ensure high-quality ground truth. Our results show alignment between automated FactLens evaluators and human judgments, and we discuss the impact of sub-claim characteristics on the overall verification performance.

* 12 pages, under review

Via

Access Paper or Ask Questions

A Blueprint Architecture of Compound AI Systems for Enterprise

Jun 02, 2024

Eser Kandogan, Sajjadur Rahman, Nikita Bhutani, Dan Zhang, Rafael Li Chen, Kushan Mitra, Sairam Gurajada, Pouya Pezeshkpour, Hayate Iso, Yanlin Feng(+4 more)

Figure 1 for A Blueprint Architecture of Compound AI Systems for Enterprise

Figure 2 for A Blueprint Architecture of Compound AI Systems for Enterprise

Figure 3 for A Blueprint Architecture of Compound AI Systems for Enterprise

Abstract:Large Language Models (LLMs) have showcased remarkable capabilities surpassing conventional NLP challenges, creating opportunities for use in production use cases. Towards this goal, there is a notable shift to building compound AI systems, wherein LLMs are integrated into an expansive software infrastructure with many components like models, retrievers, databases and tools. In this paper, we introduce a blueprint architecture for compound AI systems to operate in enterprise settings cost-effectively and feasibly. Our proposed architecture aims for seamless integration with existing compute and data infrastructure, with ``stream'' serving as the key orchestration concept to coordinate data and instructions among agents and other components. Task and data planners, respectively, break down, map, and optimize tasks and data to available agents and data sources defined in respective registries, given production constraints such as accuracy and latency.

* Compound AI Systems Workshop at the Data+AI Summit 2024

Via

Access Paper or Ask Questions

MEGAnno+: A Human-LLM Collaborative Annotation System

Feb 28, 2024

Hannah Kim, Kushan Mitra, Rafael Li Chen, Sajjadur Rahman, Dan Zhang

Figure 1 for MEGAnno+: A Human-LLM Collaborative Annotation System

Figure 2 for MEGAnno+: A Human-LLM Collaborative Annotation System

Figure 3 for MEGAnno+: A Human-LLM Collaborative Annotation System

Figure 4 for MEGAnno+: A Human-LLM Collaborative Annotation System

Abstract:Large language models (LLMs) can label data faster and cheaper than humans for various NLP tasks. Despite their prowess, LLMs may fall short in understanding of complex, sociocultural, or domain-specific context, potentially leading to incorrect annotations. Therefore, we advocate a collaborative approach where humans and LLMs work together to produce reliable and high-quality labels. We present MEGAnno+, a human-LLM collaborative annotation system that offers effective LLM agent and annotation management, convenient and robust LLM annotation, and exploratory verification of LLM labels by humans.

* EACL 2024 Demo

Via

Access Paper or Ask Questions

Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks

Nov 09, 2023

Aditi Mishra, Sajjadur Rahman, Hannah Kim, Kushan Mitra, Estevam Hruschka

Figure 1 for Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks

Figure 2 for Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks

Figure 3 for Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks

Figure 4 for Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks

Abstract:Large language models (LLMs) are proficient at generating fluent text with minimal task-specific supervision. Yet, their ability to provide well-grounded rationalizations for knowledge-intensive tasks remains under-explored. Such tasks, like commonsense multiple-choice questions, require rationales based on world knowledge to support predictions and refute alternate options. We consider the task of generating knowledge-guided rationalization in natural language by using expert-written examples in a few-shot manner. Surprisingly, crowd-workers preferred knowledge-grounded rationales over crowdsourced rationalizations, citing their factuality, sufficiency, and comprehensive refutations. Although LLMs-generated rationales were preferable, further improvements in conciseness and novelty are required. In another study, we show how rationalization of incorrect model predictions erodes humans' trust in LLM-generated rationales. Motivated by these observations, we create a two-stage pipeline to review task predictions and eliminate potential incorrect decisions before rationalization, enabling trustworthy rationale generation.

Via

Access Paper or Ask Questions