Abstract:Accurate predictive models are crucial for analysing COVID-19 mortality trends. This study evaluates the impact of a custom data preprocessing pipeline on ten machine learning models predicting COVID-19 mortality using data from Our World in Data (OWID). Our pipeline differs from a standard preprocessing pipeline through four key steps. Firstly, it transforms weekly reported totals into daily updates, correcting reporting biases and providing more accurate estimates. Secondly, it uses localised outlier detection and processing to preserve data variance and enhance accuracy. Thirdly, it utilises computational dependencies among columns to ensure data consistency. Finally, it incorporates an iterative feature selection process to optimise the feature set and improve model performance. Results show a significant improvement with the custom pipeline: the MLP Regressor achieved a test RMSE of 66.556 and a test R-squared of 0.991, surpassing the DecisionTree Regressor from the standard pipeline, which had a test RMSE of 222.858 and a test R-squared of 0.817. These findings highlight the importance of tailored preprocessing techniques in enhancing predictive modelling accuracy for COVID-19 mortality. Although specific to this study, these methodologies offer valuable insights into diverse datasets and domains, improving predictive performance across various contexts.
Abstract:We propose an approach, called the Equilibrium Distribution Model (EDM), for automatically selecting colors with optimum perceptual contrast for scientific visualization. Given any number of features that need to be emphasized in a visualization task, our approach derives evenly distributed points in the CIELAB color space to assign colors to the features so that the minimum Euclidean Distance among the colors are optimized. Our approach can assign colors with high perceptual contrast even for very high numbers of features, where other color selection methods typically fail. We compare our approach with the widely used Harmonic color selection scheme and demonstrate that while the harmonic scheme can achieve reasonable color contrast for visualizing up to 20 different features, our Equilibrium scheme provides significantly better contrast and achieves perceptible contrast for visualizing even up to 100 unique features.