Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alice Heiman

The Accuracy, Robustness, and Readability of LLM-Generated Sustainability-Related Word Definitions

Feb 02, 2025

Alice Heiman

Figure 1 for The Accuracy, Robustness, and Readability of LLM-Generated Sustainability-Related Word Definitions

Figure 2 for The Accuracy, Robustness, and Readability of LLM-Generated Sustainability-Related Word Definitions

Figure 3 for The Accuracy, Robustness, and Readability of LLM-Generated Sustainability-Related Word Definitions

Figure 4 for The Accuracy, Robustness, and Readability of LLM-Generated Sustainability-Related Word Definitions

Abstract:A common language with standardized definitions is crucial for effective climate discussions. However, concerns exist about LLMs misrepresenting climate terms. We compared 300 official IPCC glossary definitions with those generated by GPT-4o-mini, Llama3.1 8B, and Mistral 7B, analyzing adherence, robustness, and readability using SBERT sentence embeddings. The LLMs scored an average adherence of $0.57-0.59 \pm 0.15$, and their definitions proved harder to read than the originals. Model-generated definitions vary mainly among words with multiple or ambiguous definitions, showing the potential to highlight terms that need standardization. The results show how LLMs could support environmental discourse while emphasizing the need to align model outputs with established terminology for clarity and consistency.

* NLP4Ecology Workshop 2025

Via

Access Paper or Ask Questions

FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

Nov 27, 2024

Alice Heiman, Xiaoman Zhang, Emma Chen, Sung Eun Kim, Pranav Rajpurkar

Figure 1 for FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

Figure 2 for FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

Figure 3 for FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

Figure 4 for FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

Abstract:Medical vision-language model models often struggle with generating accurate quantitative measurements in radiology reports, leading to hallucinations that undermine clinical reliability. We introduce FactCheXcker, a modular framework that de-hallucinates radiology report measurements by leveraging an improved query-code-update paradigm. Specifically, FactCheXcker employs specialized modules and the code generation capabilities of large language models to solve measurement queries generated based on the original report. After extracting measurable findings, the results are incorporated into an updated report. We evaluate FactCheXcker on endotracheal tube placement, which accounts for an average of 78% of report measurements, using the MIMIC-CXR dataset and 11 medical report-generation models. Our results show that FactCheXcker significantly reduces hallucinations, improves measurement precision, and maintains the quality of the original reports. Specifically, FactCheXcker improves the performance of all 11 models and achieves an average improvement of 94.0% in reducing measurement hallucinations measured by mean absolute error.

Via

Access Paper or Ask Questions

GPT-SW3: An Autoregressive Language Model for the Nordic Languages

May 23, 2023

Ariel Ekgren, Amaru Cuba Gyllensten, Felix Stollenwerk, Joey Öhman, Tim Isbister, Evangelia Gogoulou, Fredrik Carlsson, Alice Heiman, Judit Casademont, Magnus Sahlgren

Abstract:This paper details the process of developing the first native large generative language model for the Nordic languages, GPT-SW3. We cover all parts of the development process, from data collection and processing, training configuration and instruction finetuning, to evaluation and considerations for release strategies. We hope that this paper can serve as a guide and reference for other researchers that undertake the development of large generative models for smaller languages.

Via

Access Paper or Ask Questions