Abstract:Heart failure affects millions of people worldwide, significantly reducing quality of life and leading to high mortality rates. Despite extensive research, the relationship between heart failure and mortality rates among ICU patients is not fully understood, indicating the need for more accurate prediction models. This study analyzed data from 1,177 patients over 18 years old from the MIMIC-III database, identified using ICD-9 codes. Preprocessing steps included handling missing data, removing duplicates, treating skewness, and using oversampling techniques to address data imbalances. Through rigorous feature selection using Variance Inflation Factor (VIF), expert clinical input, and ablation studies, 46 key features were identified to enhance model performance. Our analysis compared several machine learning models, including Logistic Regression, Support Vector Machine (SVM), Random Forest, LightGBM, and XGBoost. XGBoost emerged as the superior model, achieving a test AUC-ROC of 0.9228 (95\% CI 0.8748 - 0.9613), significantly outperforming our previous work (AUC-ROC of 0.8766) and the best results reported in existing literature (AUC-ROC of 0.824). The improved model's success is attributed to advanced feature selection methods, robust preprocessing techniques, and comprehensive hyperparameter optimization through Grid-Search. SHAP analysis and feature importance evaluations based on XGBoost highlighted key variables like leucocyte count and RDW, providing valuable insights into the clinical factors influencing mortality risk. This framework offers significant support for clinicians, enabling them to identify high-risk ICU heart failure patients and improve patient outcomes through timely and informed interventions.
Abstract:Background: Sepsis is a severe condition responsible for many deaths worldwide. Accurate prediction of sepsis outcomes is crucial for timely and effective treatment. Although previous studies have used ML to forecast outcomes, they faced limitations in feature selection and model comprehensibility, resulting in less effective predictions. Thus, this research aims to develop an interpretable and accurate ML model to help clinical professionals predict in-hospital mortality. Methods: We analyzed ICU patient records from the MIMIC-III database based on specific criteria and extracted relevant data. Our feature selection process included a literature review, clinical input refinement, and using Random Forest to select the top 35 features. We performed data preprocessing, including cleaning, imputation, standardization, and applied SMOTE for oversampling to address imbalance, resulting in 4,683 patients, with admission counts of 17,429. We compared the performance of Random Forest, Gradient Boosting, Logistic Regression, SVM, and KNN models. Results: The Random Forest model was the most effective in predicting sepsis-related in-hospital mortality. It outperformed other models, achieving an accuracy of 0.90 and an AUROC of 0.97, significantly better than the existing literature. Our meticulous feature selection contributed to the model's precision and identified critical determinants of sepsis mortality. These results underscore the pivotal role of data-driven ML in healthcare, especially for predicting in-hospital mortality due to sepsis. Conclusion: This study represents a significant advancement in predicting in-hospital sepsis mortality, highlighting the potential of ML in healthcare. The implications are profound, offering a data-driven approach that enhances decision-making in patient care and reduces in-hospital mortality.
Abstract:Background: Ventilator-associated pneumonia (VAP) in traumatic brain injury (TBI) patients poses a significant mortality risk and imposes a considerable financial burden on patients and healthcare systems. Timely detection and prognostication of VAP in TBI patients are crucial to improve patient outcomes and alleviate the strain on healthcare resources. Methods: We implemented six machine learning models using the MIMIC-III database. Our methodology included preprocessing steps, such as feature selection with CatBoost and expert opinion, addressing class imbalance with the Synthetic Minority Oversampling Technique (SMOTE), and rigorous model tuning through 5-fold cross-validation to optimize hyperparameters. Key models evaluated included SVM, Logistic Regression, Random Forest, XGBoost, ANN, and AdaBoost. Additionally, we conducted SHAP analysis to determine feature importance and performed an ablation study to assess feature impacts on model performance. Results: XGBoost outperformed the baseline models and the best existing literature. We used metrics, including AUC, Accuracy, Specificity, Sensitivity, F1 Score, PPV, and NPV. XGBoost demonstrated the highest performance with an AUC of 0.940 and an Accuracy of 0.875, which are 23.4% and 23.5% higher than the best results in the existing literature, with an AUC of 0.706 and an Accuracy of 0.640, respectively. This enhanced performance underscores the models' effectiveness in clinical settings. Conclusions: This study enhances the predictive modeling of VAP in TBI patients, improving early detection and intervention potential. Refined feature selection and advanced ensemble techniques significantly boosted model accuracy and reliability, offering promising directions for future clinical applications and medical diagnostics research.
Abstract:Predicting critical health outcomes such as patient mortality and hospital readmission is essential for improving survivability. However, healthcare datasets have many concurrences that create complexities, leading to poor predictions. Consequently, pre-processing the data is crucial to improve its quality. In this study, we use an existing pre-processing algorithm, concatenation, to improve data quality by decreasing the complexity of datasets. Sixteen healthcare datasets were extracted from two databases - MIMIC III and University of Illinois Hospital - converted to the event logs, they were then fed into the concatenation algorithm. The pre-processed event logs were then fed to the Split Miner (SM) algorithm to produce a process model. Process model quality was evaluated before and after concatenation using the following metrics: fitness, precision, F-Measure, and complexity. The pre-processed event logs were also used as inputs to the Decay Replay Mining (DREAM) algorithm to predict critical outcomes. We compared predicted results before and after applying the concatenation algorithm using Area Under the Curve (AUC) and Confidence Intervals (CI). Results indicated that the concatenation algorithm improved the quality of the process models and predictions of the critical health outcomes.