Abstract:Machine Learning is transforming medical research by improving diagnostic accuracy and personalizing treatments. General ML models trained on large datasets identify broad patterns across populations, but their effectiveness is often limited by the diversity of human biology. This has led to interest in subject-specific models that use individual data for more precise predictions. However, these models are costly and challenging to develop. To address this, we propose a novel validation approach that uses a general ML model to ensure reproducible performance and robust feature importance analysis at both group and subject-specific levels. We tested a single Random Forest (RF) model on nine datasets varying in domain, sample size, and demographics. Different validation techniques were applied to evaluate accuracy and feature importance consistency. To introduce variability, we performed up to 400 trials per subject, randomly seeding the ML algorithm for each trial. This generated 400 feature sets per subject, from which we identified top subject-specific features. A group-specific feature importance set was then derived from all subject-specific results. We compared our approach to conventional validation methods in terms of performance and feature importance consistency. Our repeated trials approach, with random seed variation, consistently identified key features at the subject level and improved group-level feature importance analysis using a single general model. Subject-specific models address biological variability but are resource-intensive. Our novel validation technique provides consistent feature importance and improved accuracy within a general ML model, offering a practical and explainable alternative for clinical research.
Abstract:Introduction. Low-cost health monitoring devices are increasingly being used for mental health related studies including stress. While cortisol response magnitude remains the gold standard indicator for stress assessment, a growing number of studies have started to use low-cost EEG devices as primary recorders of biomarker data. Methods. This study reviews published works contributing and/or using EEG devices for detecting stress and their associated machine learning methods. The reviewed works are selected to answer three general research questions and are then synthesized into four categories of stress assessment using EEG, low-cost EEG devices, available datasets for EEG-based stress measurement, and machine learning techniques for EEG-based stress measurement. Results. A number of studies were identified where low-cost EEG devices were utilized to record brain function during phases of stress and relaxation. These studies generally reported a high predictive accuracy rate, verified using a number of different machine learning validation methods and statistical approaches. Of these studies, 60% can be considered low-powered studies based on the small number of test subjects used during experimentation. Conclusion. Low-cost consumer grade wearable devices including EEG and wrist-based monitors are increasingly being used in stress-related studies. Standardization of EEG signal processing and importance of sensor location still requires further study, and research in this area will continue to provide improvements as more studies become available.