Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jianlin Shi

University of Utah, Salt Lake City, UT, USA

Accelerating Clinical NLP at Scale with a Hybrid Framework with Reduced GPU Demands: A Case Study in Dementia Identification

Apr 16, 2025

Jianlin Shi, Qiwei Gan, Elizabeth Hanchrow, Annie Bowles, John Stanley, Adam P. Bress, Jordana B. Cohen, Patrick R. Alba

Abstract:Clinical natural language processing (NLP) is increasingly in demand in both clinical research and operational practice. However, most of the state-of-the-art solutions are transformers-based and require high computational resources, limiting their accessibility. We propose a hybrid NLP framework that integrates rule-based filtering, a Support Vector Machine (SVM) classifier, and a BERT-based model to improve efficiency while maintaining accuracy. We applied this framework in a dementia identification case study involving 4.9 million veterans with incident hypertension, analyzing 2.1 billion clinical notes. At the patient level, our method achieved a precision of 0.90, a recall of 0.84, and an F1-score of 0.87. Additionally, this NLP approach identified over three times as many dementia cases as structured data methods. All processing was completed in approximately two weeks using a single machine with dual A40 GPUs. This study demonstrates the feasibility of hybrid NLP solutions for large-scale clinical text analysis, making state-of-the-art methods more accessible to healthcare organizations with limited computational resources.

* This manuscript has been submitted to AMIA 2025 annual symposium (https://amia.org/education-events/amia-2025-annual-symposium)

Via

Access Paper or Ask Questions

Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python

Jun 14, 2021

Hannah Eyre, Alec B Chapman, Kelly S Peterson, Jianlin Shi, Patrick R Alba, Makoto M Jones, Tamara L Box, Scott L DuVall, Olga V Patterson

Figure 1 for Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python

Figure 2 for Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python

Figure 3 for Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python

Figure 4 for Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python

Abstract:Despite impressive success of machine learning algorithms in clinical natural language processing (cNLP), rule-based approaches still have a prominent role. In this paper, we introduce medspaCy, an extensible, open-source cNLP library based on spaCy framework that allows flexible integration of rule-based and machine learning-based algorithms adapted to clinical text. MedspaCy includes a variety of components that meet common cNLP needs such as context analysis and mapping to standard terminologies. By utilizing spaCy's clear and easy-to-use conventions, medspaCy enables development of custom pipelines that integrate easily with other spaCy-based modules. Our toolkit includes several core components and facilitates rapid development of pipelines for clinical text.

* Accepted to AMIA Annual Symposium 2021

Via

Access Paper or Ask Questions

A generic rule-based system for clinical trial patient selection

Jul 16, 2019

Jianlin Shi, Kevin Graves, John F. Hurdle

Figure 1 for A generic rule-based system for clinical trial patient selection

Figure 2 for A generic rule-based system for clinical trial patient selection

Figure 3 for A generic rule-based system for clinical trial patient selection

Figure 4 for A generic rule-based system for clinical trial patient selection

Abstract:The n2c2 2018 Challenge task 1 aimed to identify patients who meet lists of heterogeneous inclusion/exclusion criteria for a hypothetical clinical trial. We demonstrate a generic rule-based natural language pipeline can support this task with decent performance (the average F1 score on the test set is 0.89, ranked the 8th out of 45 teams ).

Via

Access Paper or Ask Questions

FastContext: an efficient and scalable implementation of the ConText algorithm

Apr 30, 2019

Jianlin Shi, John F. Hurdle

Figure 1 for FastContext: an efficient and scalable implementation of the ConText algorithm

Figure 2 for FastContext: an efficient and scalable implementation of the ConText algorithm

Figure 3 for FastContext: an efficient and scalable implementation of the ConText algorithm

Figure 4 for FastContext: an efficient and scalable implementation of the ConText algorithm

Abstract:Objective: To develop and evaluate FastContext, an efficient, scalable implementation of the ConText algorithm suitable for very large-scale clinical natural language processing. Background: The ConText algorithm performs with state-of-art accuracy in detecting the experiencer, negation status, and temporality of concept mentions in clinical narratives. However, the speed limitation of its current implementations hinders its use in big data processing. Methods: We developed FastContext through hashing the ConText's rules, then compared its speed and accuracy with JavaConText and GeneralConText, two widely used Java implementations. Results: FastContext ran two orders of magnitude faster and was less decelerated by rule increase than the other two implementations used in this study for comparison. Additionally, FastContext consistently gained accuracy improvement as the rules increased (the desired outcome of adding new rules), while the other two implementations did not. Conclusions: FastContext is an efficient, scalable implementation of the popular ConText algorithm, suitable for natural language applications on very large clinical corpora.

* ournal of Biomedical Informatics, August 6, 2018

Via

Access Paper or Ask Questions