Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lalla Mouatadid

SPUQ: Perturbation-Based Uncertainty Quantification for Large Language Models

Mar 04, 2024

Xiang Gao, Jiaxin Zhang, Lalla Mouatadid, Kamalika Das

Abstract:In recent years, large language models (LLMs) have become increasingly prevalent, offering remarkable text generation capabilities. However, a pressing challenge is their tendency to make confidently wrong predictions, highlighting the critical need for uncertainty quantification (UQ) in LLMs. While previous works have mainly focused on addressing aleatoric uncertainty, the full spectrum of uncertainties, including epistemic, remains inadequately explored. Motivated by this gap, we introduce a novel UQ method, sampling with perturbation for UQ (SPUQ), designed to tackle both aleatoric and epistemic uncertainties. The method entails generating a set of perturbations for LLM inputs, sampling outputs for each perturbation, and incorporating an aggregation module that generalizes the sampling uncertainty approach for text generation tasks. Through extensive experiments on various datasets, we investigated different perturbation and aggregation techniques. Our findings show a substantial improvement in model uncertainty calibration, with a reduction in Expected Calibration Error (ECE) by 50\% on average. Our findings suggest that our proposed UQ method offers promising steps toward enhancing the reliability and trustworthiness of LLMs.

* Accepted to appear at EACL 2024

Via

Access Paper or Ask Questions

DECDM: Document Enhancement using Cycle-Consistent Diffusion Models

Nov 16, 2023

Jiaxin Zhang, Joy Rimchala, Lalla Mouatadid, Kamalika Das, Sricharan Kumar

Figure 1 for DECDM: Document Enhancement using Cycle-Consistent Diffusion Models

Figure 2 for DECDM: Document Enhancement using Cycle-Consistent Diffusion Models

Figure 3 for DECDM: Document Enhancement using Cycle-Consistent Diffusion Models

Figure 4 for DECDM: Document Enhancement using Cycle-Consistent Diffusion Models

Abstract:The performance of optical character recognition (OCR) heavily relies on document image quality, which is crucial for automatic document processing and document intelligence. However, most existing document enhancement methods require supervised data pairs, which raises concerns about data separation and privacy protection, and makes it challenging to adapt these methods to new domain pairs. To address these issues, we propose DECDM, an end-to-end document-level image translation method inspired by recent advances in diffusion models. Our method overcomes the limitations of paired training by independently training the source (noisy input) and target (clean output) models, making it possible to apply domain-specific diffusion models to other pairs. DECDM trains on one dataset at a time, eliminating the need to scan both datasets concurrently, and effectively preserving data privacy from the source or target domain. We also introduce simple data augmentation strategies to improve character-glyph conservation during translation. We compare DECDM with state-of-the-art methods on multiple synthetic data and benchmark datasets, such as document denoising and {\color{black}shadow} removal, and demonstrate the superiority of performance quantitatively and qualitatively.

* Accepted by IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)

Via

Access Paper or Ask Questions

RE$^2$: Region-Aware Relation Extraction from Visually Rich Documents

May 24, 2023

Pritika Ramu, Sijia Wang, Lalla Mouatadid, Joy Rimchala, Lifu Huang

Abstract:Current research in form understanding predominantly relies on large pre-trained language models, necessitating extensive data for pre-training. However, the importance of layout structure (i.e., the spatial relationship between the entity blocks in the visually rich document) to relation extraction has been overlooked. In this paper, we propose REgion-Aware Relation Extraction (RE$^2$) that leverages region-level spatial structure among the entity blocks to improve their relation prediction. We design an edge-aware graph attention network to learn the interaction between entities while considering their spatial relationship defined by their region-level representations. We also introduce a constraint objective to regularize the model towards consistency with the inherent constraints of the relation extraction task. Extensive experiments across various datasets, languages and domains demonstrate the superiority of our proposed approach.

Via

Access Paper or Ask Questions

A Scalable Technique for Weak-Supervised Learning with Domain Constraints

Jan 12, 2023

Sudhir Agarwal, Anu Sreepathy, Lalla Mouatadid

Abstract:We propose a novel scalable end-to-end pipeline that uses symbolic domain knowledge as constraints for learning a neural network for classifying unlabeled data in a weak-supervised manner. Our approach is particularly well-suited for settings where the data consists of distinct groups (classes) that lends itself to clustering-friendly representation learning and the domain constraints can be reformulated for use of efficient mathematical optimization techniques by considering multiple training examples at once. We evaluate our approach on a variant of the MNIST image classification problem where a training example consists of image sequences and the sum of the numbers represented by the sequences, and show that our approach scales significantly better than previous approaches that rely on computing all constraint satisfying combinations for each training example.

* 6 pages, 4 figures. Accepted at NeurIPS 2022 Workshop "Has it trained yet"

Via

Access Paper or Ask Questions