Abstract:Heart failure (HF) is a critical condition in which the accurate prediction of mortality plays a vital role in guiding patient management decisions. However, clinical datasets used for mortality prediction in HF often suffer from an imbalanced distribution of classes, posing significant challenges. In this paper, we explore preprocessing methods for enhancing one-month mortality prediction in HF patients. We present a comprehensive preprocessing framework including scaling, outliers processing and resampling as key techniques. We also employed an aware encoding approach to effectively handle missing values in clinical datasets. Our study utilizes a comprehensive dataset from the Persian Registry Of cardio Vascular disease (PROVE) with a significant class imbalance. By leveraging appropriate preprocessing techniques and Machine Learning (ML) algorithms, we aim to improve mortality prediction performance for HF patients. The results reveal an average enhancement of approximately 3.6% in F1 score and 2.7% in MCC for tree-based models, specifically Random Forest (RF) and XGBoost (XGB). This demonstrates the efficiency of our preprocessing approach in effectively handling Imbalanced Clinical Datasets (ICD). Our findings hold promise in guiding healthcare professionals to make informed decisions and improve patient outcomes in HF management.