Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xindi Wang

Generative Large Language Models Trained for Detecting Errors in Radiology Reports

Apr 06, 2025

Cong Sun, Kurt Teichman, Yiliang Zhou, Brian Critelli, David Nauheim, Graham Keir, Xindi Wang, Judy Zhong, Adam E Flanders, George Shih(+1 more)

Abstract:In this retrospective study, a dataset was constructed with two parts. The first part included 1,656 synthetic chest radiology reports generated by GPT-4 using specified prompts, with 828 being error-free synthetic reports and 828 containing errors. The second part included 614 reports: 307 error-free reports between 2011 and 2016 from the MIMIC-CXR database and 307 corresponding synthetic reports with errors generated by GPT-4 on the basis of these MIMIC-CXR reports and specified prompts. All errors were categorized into four types: negation, left/right, interval change, and transcription errors. Then, several models, including Llama-3, GPT-4, and BiomedBERT, were refined using zero-shot prompting, few-shot prompting, or fine-tuning strategies. Finally, the performance of these models was evaluated using the F1 score, 95\% confidence interval (CI) and paired-sample t-tests on our constructed dataset, with the prediction results further assessed by radiologists. Using zero-shot prompting, the fine-tuned Llama-3-70B-Instruct model achieved the best performance with the following F1 scores: 0.769 for negation errors, 0.772 for left/right errors, 0.750 for interval change errors, 0.828 for transcription errors, and 0.780 overall. In the real-world evaluation phase, two radiologists reviewed 200 randomly selected reports output by the model. Of these, 99 were confirmed to contain errors detected by the models by both radiologists, and 163 were confirmed to contain model-detected errors by at least one radiologist. Generative LLMs, fine-tuned on synthetic and MIMIC-CXR radiology reports, greatly enhanced error detection in radiology reports.

Via

Access Paper or Ask Questions

Graph-tree Fusion Model with Bidirectional Information Propagation for Long Document Classification

Oct 03, 2024

Sudipta Singha Roy, Xindi Wang, Robert E. Mercer, Frank Rudzicz

Abstract:Long document classification presents challenges in capturing both local and global dependencies due to their extensive content and complex structure. Existing methods often struggle with token limits and fail to adequately model hierarchical relationships within documents. To address these constraints, we propose a novel model leveraging a graph-tree structure. Our approach integrates syntax trees for sentence encodings and document graphs for document encodings, which capture fine-grained syntactic relationships and broader document contexts, respectively. We use Tree Transformers to generate sentence encodings, while a graph attention network models inter- and intra-sentence dependencies. During training, we implement bidirectional information propagation from word-to-sentence-to-document and vice versa, which enriches the contextual representation. Our proposed method enables a comprehensive understanding of content at all hierarchical levels and effectively handles arbitrarily long contexts without token limit constraints. Experimental results demonstrate the effectiveness of our approach in all types of long document classification tasks.

* accepted to EMNLP findings 2024

Via

Access Paper or Ask Questions

Enhancing disease detection in radiology reports through fine-tuning lightweight LLM on weak labels

Sep 25, 2024

Yishu Wei, Xindi Wang, Hanley Ong, Yiliang Zhou, Adam Flanders, George Shih, Yifan Peng

Abstract:Despite significant progress in applying large language models (LLMs) to the medical domain, several limitations still prevent them from practical applications. Among these are the constraints on model size and the lack of cohort-specific labeled datasets. In this work, we investigated the potential of improving a lightweight LLM, such as Llama 3.1-8B, through fine-tuning with datasets using synthetic labels. Two tasks are jointly trained by combining their respective instruction datasets. When the quality of the task-specific synthetic labels is relatively high (e.g., generated by GPT4- o), Llama 3.1-8B achieves satisfactory performance on the open-ended disease detection task, with a micro F1 score of 0.91. Conversely, when the quality of the task-relevant synthetic labels is relatively low (e.g., from the MIMIC-CXR dataset), fine-tuned Llama 3.1-8B is able to surpass its noisy teacher labels (micro F1 score of 0.67 v.s. 0.63) when calibrated against curated labels, indicating the strong inherent underlying capability of the model. These findings demonstrate the potential of fine-tuning LLMs with synthetic labels, offering a promising direction for future research on LLM specialization in the medical domain.

Via

Access Paper or Ask Questions

Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation

May 29, 2024

Xindi Wang, Robert E. Mercer, Frank Rudzicz

Figure 1 for Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation

Figure 2 for Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation

Figure 3 for Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation

Figure 4 for Multi-stage Retrieve and Re-rank Model for Automatic Medical Coding Recommendation

Abstract:The International Classification of Diseases (ICD) serves as a definitive medical classification system encompassing a wide range of diseases and conditions. The primary objective of ICD indexing is to allocate a subset of ICD codes to a medical record, which facilitates standardized documentation and management of various health conditions. Most existing approaches have suffered from selecting the proper label subsets from an extremely large ICD collection with a heavy long-tailed label distribution. In this paper, we leverage a multi-stage ``retrieve and re-rank'' framework as a novel solution to ICD indexing, via a hybrid discrete retrieval method, and re-rank retrieved candidates with contrastive learning that allows the model to make more accurate predictions from a simplified label space. The retrieval model is a hybrid of auxiliary knowledge of the electronic health records (EHR) and a discrete retrieval method (BM25), which efficiently collects high-quality candidates. In the last stage, we propose a label co-occurrence guided contrastive re-ranking model, which re-ranks the candidate labels by pulling together the clinical notes with positive ICD codes. Experimental results show the proposed method achieves state-of-the-art performance on a number of measures on the MIMIC-III benchmark.

* Accepted to NAACL 2024 -- camera-ready version

Via

Access Paper or Ask Questions

Auxiliary Knowledge-Induced Learning for Automatic Multi-Label Medical Document Classification

May 29, 2024

Xindi Wang, Robert E. Mercer, Frank Rudzicz

Abstract:The International Classification of Diseases (ICD) is an authoritative medical classification system of different diseases and conditions for clinical and management purposes. ICD indexing assigns a subset of ICD codes to a medical record. Since human coding is labour-intensive and error-prone, many studies employ machine learning to automate the coding process. ICD coding is a challenging task, as it needs to assign multiple codes to each medical document from an extremely large hierarchically organized collection. In this paper, we propose a novel approach for ICD indexing that adopts three ideas: (1) we use a multi-level deep dilated residual convolution encoder to aggregate the information from the clinical notes and learn document representations across different lengths of the texts; (2) we formalize the task of ICD classification with auxiliary knowledge of the medical records, which incorporates not only the clinical texts but also different clinical code terminologies and drug prescriptions for better inferring the ICD codes; and (3) we introduce a graph convolutional network to leverage the co-occurrence patterns among ICD codes, aiming to enhance the quality of label representations. Experimental results show the proposed method achieves state-of-the-art performance on a number of measures.

* Accepted to LREC-COLING 2024 -- camera-ready version

Via

Access Paper or Ask Questions

Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models

Feb 03, 2024

Xindi Wang, Mahsa Salmani, Parsa Omidi, Xiangyu Ren, Mehdi Rezagholizadeh, Armaghan Eshaghi

Abstract:Recently, large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses. However, this is achieved at the expense of stringent computational and memory requirements, hindering their ability to effectively support long input sequences. This survey provides an inclusive review of the recent techniques and methods devised to extend the sequence length in LLMs, thereby enhancing their capacity for long-context understanding. In particular, we review and categorize a wide range of techniques including architectural modifications, such as modified positional encoding and altered attention mechanisms, which are designed to enhance the processing of longer sequences while avoiding a proportional increase in computational requirements. The diverse methodologies investigated in this study can be leveraged across different phases of LLMs, i.e., training, fine-tuning and inference. This enables LLMs to efficiently process extended sequences. The limitations of the current methodologies is discussed in the last section along with the suggestions for future research directions, underscoring the importance of sequence length in the continued advancement of LLMs.

Via

Access Paper or Ask Questions

Investigating the Learning Behaviour of In-context Learning: A Comparison with Supervised Learning

Aug 01, 2023

Xindi Wang, Yufei Wang, Can Xu, Xiubo Geng, Bowen Zhang, Chongyang Tao, Frank Rudzicz, Robert E. Mercer, Daxin Jiang

Abstract:Large language models (LLMs) have shown remarkable capacity for in-context learning (ICL), where learning a new task from just a few training examples is done without being explicitly pre-trained. However, despite the success of LLMs, there has been little understanding of how ICL learns the knowledge from the given prompts. In this paper, to make progress toward understanding the learning behaviour of ICL, we train the same LLMs with the same demonstration examples via ICL and supervised learning (SL), respectively, and investigate their performance under label perturbations (i.e., noisy labels and label imbalance) on a range of classification tasks. First, via extensive experiments, we find that gold labels have significant impacts on the downstream in-context performance, especially for large language models; however, imbalanced labels matter little to ICL across all model sizes. Second, when comparing with SL, we show empirically that ICL is less sensitive to label perturbations than SL, and ICL gradually attains comparable performance to SL as the model size increases.

* accepted to ECAI 2023 (camera-ready)

Via

Access Paper or Ask Questions

MeSHup: A Corpus for Full Text Biomedical Document Indexing

Apr 28, 2022

Xindi Wang, Robert E. Mercer, Frank Rudzicz

Figure 1 for MeSHup: A Corpus for Full Text Biomedical Document Indexing

Figure 2 for MeSHup: A Corpus for Full Text Biomedical Document Indexing

Figure 3 for MeSHup: A Corpus for Full Text Biomedical Document Indexing

Figure 4 for MeSHup: A Corpus for Full Text Biomedical Document Indexing

Abstract:Medical Subject Heading (MeSH) indexing refers to the problem of assigning a given biomedical document with the most relevant labels from an extremely large set of MeSH terms. Currently, the vast number of biomedical articles in the PubMed database are manually annotated by human curators, which is time consuming and costly; therefore, a computational system that can assist the indexing is highly valuable. When developing supervised MeSH indexing systems, the availability of a large-scale annotated text corpus is desirable. A publicly available, large corpus that permits robust evaluation and comparison of various systems is important to the research community. We release a large scale annotated MeSH indexing corpus, MeSHup, which contains 1,342,667 full text articles in English, together with the associated MeSH labels and metadata, authors, and publication venues that are collected from the MEDLINE database. We train an end-to-end model that combines features from documents and their associated labels on our corpus and report the new baseline.

* LREC 2022 main conference

Via

Access Paper or Ask Questions

KenMeSH: Knowledge-enhanced End-to-end Biomedical Text Labelling

Mar 14, 2022

Xindi Wang, Robert E. Mercer, Frank Rudzicz

Figure 1 for KenMeSH: Knowledge-enhanced End-to-end Biomedical Text Labelling

Figure 2 for KenMeSH: Knowledge-enhanced End-to-end Biomedical Text Labelling

Figure 3 for KenMeSH: Knowledge-enhanced End-to-end Biomedical Text Labelling

Figure 4 for KenMeSH: Knowledge-enhanced End-to-end Biomedical Text Labelling

Abstract:Currently, Medical Subject Headings (MeSH) are manually assigned to every biomedical article published and subsequently recorded in the PubMed database to facilitate retrieving relevant information. With the rapid growth of the PubMed database, large-scale biomedical document indexing becomes increasingly important. MeSH indexing is a challenging task for machine learning, as it needs to assign multiple labels to each article from an extremely large hierachically organized collection. To address this challenge, we propose KenMeSH, an end-to-end model that combines new text features and a dynamic \textbf{K}nowledge-\textbf{en}hanced mask attention that integrates document features with MeSH label hierarchy and journal correlation features to index MeSH terms. Experimental results show the proposed method achieves state-of-the-art performance on a number of measures.

* main conference at ACL 2022

Via

Access Paper or Ask Questions

Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP

Nov 19, 2020

John Chen, Ian Berlot-Atwell, Safwan Hossain, Xindi Wang, Frank Rudzicz

Figure 1 for Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP

Figure 2 for Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP

Figure 3 for Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP

Figure 4 for Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP

Abstract:Clinical machine learning is increasingly multimodal, collected in both structured tabular formats and unstructured forms such as freetext. We propose a novel task of exploring fairness on a multimodal clinical dataset, adopting equalized odds for the downstream medical prediction tasks. To this end, we investigate a modality-agnostic fairness algorithm - equalized odds post processing - and compare it to a text-specific fairness algorithm: debiased clinical word embeddings. Despite the fact that debiased word embeddings do not explicitly address equalized odds of protected groups, we show that a text-specific approach to fairness may simultaneously achieve a good balance of performance and classical notions of fairness. We hope that our paper inspires future contributions at the critical intersection of clinical NLP and fairness. The full source code is available here: https://github.com/johntiger1/multimodal_fairness

* Proceedings of the 3rd Clinical Natural Language Processing Workshop (2020), pages 301--312
* Best paper award at 3rd Clinical Natural Language Processing Workshop at EMNLP 2020

Via

Access Paper or Ask Questions