Abstract:Background: Dementia, marked by cognitive decline, is a global health challenge. Alzheimer's disease (AD), the leading type, accounts for ~70% of cases. Electroencephalography (EEG) measures show promise in identifying AD risk, but obtaining large samples for reliable comparisons is challenging. Objective: This study integrates signal processing, harmonization, and statistical techniques to enhance sample size and improve AD risk classification reliability. Methods: We used advanced EEG preprocessing, feature extraction, harmonization, and propensity score matching (PSM) to balance healthy non-carriers (HC) and asymptomatic E280A mutation carriers (ACr). Data from four databases were harmonized to adjust site effects while preserving covariates like age and sex. PSM ratios (2:1, 5:1, 10:1) were applied to assess sample size impact on model performance. The final dataset underwent machine learning analysis with decision trees and cross-validation for robust results. Results: Balancing sample sizes via PSM significantly improved classification accuracy, ranging from 0.92 to 0.96 across ratios. This approach enabled precise risk identification even with limited samples. Conclusion: Integrating data processing, harmonization, and balancing techniques improves AD risk classification accuracy, offering potential for other neurodegenerative diseases.