Abstract:Objective: Systemic lupus erythematosus (SLE) is a complex autoimmune disease characterized by unpredictable flares. This study aimed to develop a novel proteomics-based risk prediction model specifically for Asian SLE populations to enhance personalized disease management and early intervention. Methods: A longitudinal cohort study was conducted over 48 weeks, including 139 SLE patients monitored every 12 weeks. Patients were classified into flare (n = 53) and non-flare (n = 86) groups. Baseline plasma samples underwent data-independent acquisition (DIA) proteomics analysis, and phenome-wide Mendelian randomization (PheWAS) was performed to evaluate causal relationships between proteins and clinical predictors. Logistic regression (LR) and random forest (RF) models were used to integrate proteomic and clinical data for flare risk prediction. Results: Five proteins (SAA1, B4GALT5, GIT2, NAA15, and RPIA) were significantly associated with SLE Disease Activity Index-2K (SLEDAI-2K) scores and 1-year flare risk, implicating key pathways such as B-cell receptor signaling and platelet degranulation. SAA1 demonstrated causal effects on flare-related clinical markers, including hemoglobin and red blood cell counts. A combined model integrating clinical and proteomic data achieved the highest predictive accuracy (AUC = 0.769), surpassing individual models. SAA1 was highlighted as a priority biomarker for rapid flare discrimination. Conclusion: The integration of proteomic and clinical data significantly improves flare prediction in Asian SLE patients. The identification of key proteins and their causal relationships with flare-related clinical markers provides valuable insights for proactive SLE management and personalized therapeutic approaches.
Abstract:This study proposes Evolutionary Causal Discovery (ECD) for causal discovery that tailors response variables, predictor variables, and corresponding operators to research datasets. Utilizing genetic programming for variable relationship parsing, the method proceeds with the Relative Impact Stratification (RIS) algorithm to assess the relative impact of predictor variables on the response variable, facilitating expression simplification and enhancing the interpretability of variable relationships. ECD proposes an expression tree to visualize the RIS results, offering a differentiated depiction of unknown causal relationships compared to conventional causal discovery. The ECD method represents an evolution and augmentation of existing causal discovery methods, providing an interpretable approach for analyzing variable relationships in complex systems, particularly in healthcare settings with Electronic Health Record (EHR) data. Experiments on both synthetic and real-world EHR datasets demonstrate the efficacy of ECD in uncovering patterns and mechanisms among variables, maintaining high accuracy and stability across different noise levels. On the real-world EHR dataset, ECD reveals the intricate relationships between the response variable and other predictive variables, aligning with the results of structural equation modeling and shapley additive explanations analyses.
Abstract:In data imputation, effectively addressing missing values is pivotal, especially in intricate datasets. This paper delves into the FIML Optimized Self-attention (FOSA) framework, an innovative approach that amalgamates the strengths of Full Information Maximum Likelihood (FIML) estimation with the capabilities of self-attention neural networks. Our methodology commences with an initial estimation of missing values via FIML, subsequently refining these estimates by leveraging the self-attention mechanism. Our comprehensive experiments on both simulated and real-world datasets underscore FOSA's pronounced advantages over traditional FIML techniques, encapsulating facets of accuracy, computational efficiency, and adaptability to diverse data structures. Intriguingly, even in scenarios where the Structural Equation Model (SEM) might be mis-specified, leading to suboptimal FIML estimates, the robust architecture of FOSA's self-attention component adeptly rectifies and optimizes the imputation outcomes. Our empirical tests reveal that FOSA consistently delivers commendable predictions, even in the face of up to 40% random missingness, highlighting its robustness and potential for wide-scale applications in data imputation.