Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Patrizio Giovannotti

Calibrated Large Language Models for Binary Question Answering

Jul 01, 2024

Patrizio Giovannotti, Alexander Gammerman

Abstract:Quantifying the uncertainty of predictions made by large language models (LLMs) in binary text classification tasks remains a challenge. Calibration, in the context of LLMs, refers to the alignment between the model's predicted probabilities and the actual correctness of its predictions. A well-calibrated model should produce probabilities that accurately reflect the likelihood of its predictions being correct. We propose a novel approach that utilizes the inductive Venn--Abers predictor (IVAP) to calibrate the probabilities associated with the output tokens corresponding to the binary labels. Our experiments on the BoolQ dataset using the Llama 2 model demonstrate that IVAP consistently outperforms the commonly used temperature scaling method for various label token choices, achieving well-calibrated probabilities while maintaining high predictive quality. Our findings contribute to the understanding of calibration techniques for LLMs and provide a practical solution for obtaining reliable uncertainty estimates in binary question answering tasks, enhancing the interpretability and trustworthiness of LLM predictions.

* Accepted to COPA 2024 (13th Symposium on Conformal and Probabilistic Prediction with Applications)

Via

Access Paper or Ask Questions

Evaluating Machine Translation Quality with Conformal Predictive Distributions

Jun 02, 2023

Patrizio Giovannotti

Abstract:This paper presents a new approach for assessing uncertainty in machine translation by simultaneously evaluating translation quality and providing a reliable confidence score. Our approach utilizes conformal predictive distributions to produce prediction intervals with guaranteed coverage, meaning that for any given significance level $\epsilon$, we can expect the true quality score of a translation to fall out of the interval at a rate of $1-\epsilon$. In this paper, we demonstrate how our method outperforms a simple, but effective baseline on six different language pairs in terms of coverage and sharpness. Furthermore, we validate that our approach requires the data exchangeability assumption to hold for optimal performance.

* Accepted at the 12th Symposium on Conformal and Probabilistic Prediction with Applications, COPA 2023

Via

Access Paper or Ask Questions

Calibration of Natural Language Understanding Models with Venn--ABERS Predictors

May 21, 2022

Patrizio Giovannotti

Figure 1 for Calibration of Natural Language Understanding Models with Venn--ABERS Predictors

Figure 2 for Calibration of Natural Language Understanding Models with Venn--ABERS Predictors

Figure 3 for Calibration of Natural Language Understanding Models with Venn--ABERS Predictors

Figure 4 for Calibration of Natural Language Understanding Models with Venn--ABERS Predictors

Abstract:Transformers, currently the state-of-the-art in natural language understanding (NLU) tasks, are prone to generate uncalibrated predictions or extreme probabilities, making the process of taking different decisions based on their output relatively difficult. In this paper we propose to build several inductive Venn--ABERS predictors (IVAP), which are guaranteed to be well calibrated under minimal assumptions, based on a selection of pre-trained transformers. We test their performance over a set of diverse NLU tasks and show that they are capable of producing well-calibrated probabilistic predictions that are uniformly spread over the [0,1] interval -- all while retaining the original model's predictive accuracy.

* Submitted to the 11th Symposium on Conformal and Probabilistic Prediction with Applications - COPA 2022

Via

Access Paper or Ask Questions