Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guanghua Xiao

Large language models enabled multiagent ensemble method for efficient EHR data labeling

Oct 21, 2024

Jingwei Huang, Kuroush Nezafati, Ismael Villanueva-Miranda, Zifan Gu, Ann Marie Navar, Tingyi Wanyan, Qin Zhou, Bo Yao, Ruichen Rong, Xiaowei Zhan(+4 more)

Figure 1 for Large language models enabled multiagent ensemble method for efficient EHR data labeling

Figure 2 for Large language models enabled multiagent ensemble method for efficient EHR data labeling

Figure 3 for Large language models enabled multiagent ensemble method for efficient EHR data labeling

Figure 4 for Large language models enabled multiagent ensemble method for efficient EHR data labeling

Abstract:This study introduces a novel multiagent ensemble method powered by LLMs to address a key challenge in ML - data labeling, particularly in large-scale EHR datasets. Manual labeling of such datasets requires domain expertise and is labor-intensive, time-consuming, expensive, and error-prone. To overcome this bottleneck, we developed an ensemble LLMs method and demonstrated its effectiveness in two real-world tasks: (1) labeling a large-scale unlabeled ECG dataset in MIMIC-IV; (2) identifying social determinants of health (SDOH) from the clinical notes of EHR. Trading off benefits and cost, we selected a pool of diverse open source LLMs with satisfactory performance. We treat each LLM's prediction as a vote and apply a mechanism of majority voting with minimal winning threshold for ensemble. We implemented an ensemble LLMs application for EHR data labeling tasks. By using the ensemble LLMs and natural language processing, we labeled MIMIC-IV ECG dataset of 623,566 ECG reports with an estimated accuracy of 98.2%. We applied the ensemble LLMs method to identify SDOH from social history sections of 1,405 EHR clinical notes, also achieving competitive performance. Our experiments show that the ensemble LLMs can outperform individual LLM even the best commercial one, and the method reduces hallucination errors. From the research, we found that (1) the ensemble LLMs method significantly reduces the time and effort required for labeling large-scale EHR data, automating the process with high accuracy and quality; (2) the method generalizes well to other text data labeling tasks, as shown by its application to SDOH identification; (3) the ensemble of a group of diverse LLMs can outperform or match the performance of the best individual LLM; and (4) the ensemble method substantially reduces hallucination errors. This approach provides a scalable and efficient solution to data-labeling challenges.

* 27 pages, 13 figures. Under journal review

Via

Access Paper or Ask Questions

Discovering Clinically Meaningful Shape Features for the Analysis of Tumor Pathology Images

Dec 09, 2020

Esteban Fernández Morales, Cong Zhang, Guanghua Xiao, Chul Moon, Qiwei Li

Figure 1 for Discovering Clinically Meaningful Shape Features for the Analysis of Tumor Pathology Images

Figure 2 for Discovering Clinically Meaningful Shape Features for the Analysis of Tumor Pathology Images

Figure 3 for Discovering Clinically Meaningful Shape Features for the Analysis of Tumor Pathology Images

Figure 4 for Discovering Clinically Meaningful Shape Features for the Analysis of Tumor Pathology Images

Abstract:With the advanced imaging technology, digital pathology imaging of tumor tissue slides is becoming a routine clinical procedure for cancer diagnosis. This process produces massive imaging data that capture histological details in high resolution. Recent developments in deep-learning methods have enabled us to automatically detect and characterize the tumor regions in pathology images at large scale. From each identified tumor region, we extracted 30 well-defined descriptors that quantify its shape, geometry, and topology. We demonstrated how those descriptor features were associated with patient survival outcome in lung adenocarcinoma patients from the National Lung Screening Trial (n=143). Besides, a descriptor-based prognostic model was developed and validated in an independent patient cohort from The Cancer Genome Atlas Program program (n=318). This study proposes new insights into the relationship between tumor shape, geometrical, and topological features and patient prognosis. We provide software in the form of R code on GitHub: https://github.com/estfernandez/Slide_Image_Segmentation_and_Extraction.

Via

Access Paper or Ask Questions

Predicting survival outcomes using topological features of tumor pathology images

Dec 07, 2020

Chul Moon, Qiwei Li, Guanghua Xiao

Figure 1 for Predicting survival outcomes using topological features of tumor pathology images

Figure 2 for Predicting survival outcomes using topological features of tumor pathology images

Figure 3 for Predicting survival outcomes using topological features of tumor pathology images

Figure 4 for Predicting survival outcomes using topological features of tumor pathology images

Abstract:Tumor shape and size have been used as important markers for cancer diagnosis and treatment. Recent developments in medical imaging technology enable more detailed segmentation of tumor regions in high resolution. This paper proposes a topological feature to characterize tumor progression from digital pathology images and examine its effect on the time-to-event data. We develop distance transform for pathology images and show that a topological summary statistic computed by persistent homology quantifies tumor shape, size, distribution, and connectivity. The topological features are represented in functional space and used as functional predictors in a functional Cox regression model. A case study is conducted using non-small cell lung cancer pathology images. The results show that the topological features predict survival prognosis after adjusting for age, sex, smoking status, stage, and size of tumors. Also, the topological features with non-zero effects correspond to the shapes that are known to be related to tumor progression. Our study provides a new perspective for understanding tumor shape and patient prognosis.

Via

Access Paper or Ask Questions

ConvPath: A Software Tool for Lung Adenocarcinoma Digital Pathological Image Analysis Aided by Convolutional Neural Network

Sep 20, 2018

Shidan Wang, Tao Wang, Lin Yang, Faliu Yi, Xin Luo, Yikun Yang, Adi Gazdar, Junya Fujimoto, Ignacio I. Wistuba, Bo Yao(+4 more)

Figure 1 for ConvPath: A Software Tool for Lung Adenocarcinoma Digital Pathological Image Analysis Aided by Convolutional Neural Network

Figure 2 for ConvPath: A Software Tool for Lung Adenocarcinoma Digital Pathological Image Analysis Aided by Convolutional Neural Network

Figure 3 for ConvPath: A Software Tool for Lung Adenocarcinoma Digital Pathological Image Analysis Aided by Convolutional Neural Network

Figure 4 for ConvPath: A Software Tool for Lung Adenocarcinoma Digital Pathological Image Analysis Aided by Convolutional Neural Network

Abstract:The spatial distributions of different types of cells could reveal a cancer cell growth pattern, its relationships with the tumor microenvironment and the immune response of the body, all of which represent key hallmarks of cancer. However, manually recognizing and localizing all the cells in pathology slides are almost impossible. In this study, we developed an automated cell type classification pipeline, ConvPath, which includes nuclei segmentation, convolutional neural network-based tumor, stromal and lymphocytes classification, and extraction of tumor microenvironment related features for lung cancer pathology images. The overall classification accuracy is 92.9% and 90.1% in training and independent testing datasets, respectively. By identifying cells and classifying cell types, this pipeline can convert a pathology image into a spatial map of tumor, stromal and lymphocyte cells. From this spatial map, we can extracted features that characterize the tumor micro-environment. Based on these features, we developed an image feature-based prognostic model and validated the model in two independent cohorts. The predicted risk group serves as an independent prognostic factor, after adjusting for clinical variables that include age, gender, smoking status, and stage.

Via

Access Paper or Ask Questions