Abstract:A frequent problem in statistical science is how to properly handle missing data in matched paired observations. There is a large body of literature coping with the univariate case. Yet, the ongoing technological progress in measuring biological systems raises the need for addressing more complex data, e.g., graphs, strings and probability distributions, among others. In order to fill this gap, this paper proposes new estimators of the maximum mean discrepancy (MMD) to handle complex matched pairs with missing data. These estimators can detect differences in data distributions under different missingness mechanisms. The validity of this approach is proven and further studied in an extensive simulation study, and results of statistical consistency are provided. Data from continuous glucose monitoring in a longitudinal population-based diabetes study are used to illustrate the application of this approach. By employing the new distributional representations together with cluster analysis, new clinical criteria on how glucose changes vary at the distributional level over five years can be explored.