Abstract:Modelling longitudinal data is an important yet challenging task. These datasets can be high-dimensional, contain non-linear effects and time-varying covariates. Gaussian process (GP) prior-based variational autoencoders (VAEs) have emerged as a promising approach due to their ability to model time-series data. However, they are costly to train and struggle to fully exploit the rich covariates characteristic of longitudinal data, making them difficult for practitioners to use effectively. In this work, we leverage linear mixed models (LMMs) and amortized variational inference to provide conditional priors for VAEs, and propose LMM-VAE, a scalable, interpretable and identifiable model. We highlight theoretical connections between it and GP-based techniques, providing a unified framework for this class of methods. Our proposal performs competitively compared to existing approaches across simulated and real-world datasets.
Abstract:Current approaches to model-based offline Reinforcement Learning (RL) often incorporate uncertainty-based reward penalization to address the distributional shift problem. While these approaches have achieved some success, we argue that this penalization introduces excessive conservatism, potentially resulting in suboptimal policies through underestimation. We identify as an important cause of over-penalization the lack of a reliable uncertainty estimator capable of propagating uncertainties in the Bellman operator. The common approach to calculating the penalty term relies on sampling-based uncertainty estimation, resulting in high variance. To address this challenge, we propose a novel method termed Moment Matching Offline Model-Based Policy Optimization (MOMBO). MOMBO learns a Q-function using moment matching, which allows us to deterministically propagate uncertainties through the Q-function. We evaluate MOMBO's performance across various environments and demonstrate empirically that MOMBO is a more stable and sample-efficient approach.