Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Katherine Collins

Large Language Models Must Be Taught to Know What They Don't Know

Jun 12, 2024

Sanyam Kapoor, Nate Gruver, Manley Roberts, Katherine Collins, Arka Pal, Umang Bhatt, Adrian Weller, Samuel Dooley, Micah Goldblum, Andrew Gordon Wilson

Figure 1 for Large Language Models Must Be Taught to Know What They Don't Know

Figure 2 for Large Language Models Must Be Taught to Know What They Don't Know

Figure 3 for Large Language Models Must Be Taught to Know What They Don't Know

Figure 4 for Large Language Models Must Be Taught to Know What They Don't Know

Abstract:When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibration and then show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead. We show that a thousand graded examples are sufficient to outperform baseline methods and that training through the features of a model is necessary for good performance and tractable for large open-source models when using LoRA. We also investigate the mechanisms that enable reliable LLM uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators, applicable not just to their own uncertainties but also the uncertainty of other models. Lastly, we show that uncertainty estimates inform human use of LLMs in human-AI collaborative settings through a user study.

* Code available at: https://github.com/activatedgeek/calibration-tuning

Via

Access Paper or Ask Questions

Understanding Subjectivity through the Lens of Motivational Context in Model-Generated Image Satisfaction

Feb 27, 2024

Senjuti Dutta, Sherol Chen, Sunny Mak, Amnah Ahmad, Katherine Collins, Alena Butryna, Deepak Ramachandran, Krishnamurthy Dvijotham, Ellie Pavlick, Ravi Rajakumar

Figure 1 for Understanding Subjectivity through the Lens of Motivational Context in Model-Generated Image Satisfaction

Figure 2 for Understanding Subjectivity through the Lens of Motivational Context in Model-Generated Image Satisfaction

Figure 3 for Understanding Subjectivity through the Lens of Motivational Context in Model-Generated Image Satisfaction

Figure 4 for Understanding Subjectivity through the Lens of Motivational Context in Model-Generated Image Satisfaction

Abstract:Image generation models are poised to become ubiquitous in a range of applications. These models are often fine-tuned and evaluated using human quality judgments that assume a universal standard, failing to consider the subjectivity of such tasks. To investigate how to quantify subjectivity, and the scale of its impact, we measure how assessments differ among human annotators across different use cases. Simulating the effects of ordinarily latent elements of annotators subjectivity, we contrive a set of motivations (t-shirt graphics, presentation visuals, and phone background images) to contextualize a set of crowdsourcing tasks. Our results show that human evaluations of images vary within individual contexts and across combinations of contexts. Three key factors affecting this subjectivity are image appearance, image alignment with text, and representation of objects mentioned in the text. Our study highlights the importance of taking individual users and contexts into account, both when building and evaluating generative models

Via

Access Paper or Ask Questions