This paper proposes a method for detecting anomalies in video data. A Variational Autoencoder (VAE) is used for reducing the dimensionality of video frames, generating latent space information that is comparable to low-dimensional sensory data (e.g., positioning, steering angle), making feasible the development of a consistent multi-modal architecture for autonomous vehicles. An Adapted Markov Jump Particle Filter defined by discrete and continuous inference levels is employed to predict the following frames and detecting anomalies in new video sequences. Our method is evaluated on different video scenarios where a semi-autonomous vehicle performs a set of tasks in a closed environment.