Abstract:Research in Multiple Sclerosis (MS) has recently focused on extracting knowledge from real-world clinical data sources. This type of data is more abundant than data produced during clinical trials and potentially more informative about real-world clinical practice. However, this comes at the cost of less curated and controlled data sets. In this work, we address the task of optimally extracting information from longitudinal patient data in the real-world setting with a special focus on the sporadic sampling problem. Using the MSBase registry, we show that with machine learning methods suited for patient trajectories modeling, such as recurrent neural networks and tensor factorization, we can predict disability progression of patients in a two-year horizon with an ROC-AUC of 0.86, which represents a 33% decrease in the ranking pair error (1-AUC) compared to reference methods using static clinical features. Compared to the models available in the literature, this work uses the most complete patient history for MS disease progression prediction.