Particle accelerators are complex systems that focus, guide, and accelerate intense charged particle beams to high energy. Beam diagnostics present a challenging problem due to limited non-destructive measurements, computationally demanding simulations, and inherent uncertainties in the system. We propose a two-step unsupervised deep learning framework named as Conditional Latent Autoregressive Recurrent Model (CLARM) for learning the spatiotemporal dynamics of charged particles in accelerators. CLARM consists of a Conditional Variational Autoencoder (CVAE) transforming six-dimensional phase space into a lower-dimensional latent distribution and a Long Short-Term Memory (LSTM) network capturing temporal dynamics in an autoregressive manner. The CLARM can generate projections at various accelerator modules by sampling and decoding the latent space representation. The model also forecasts future states (downstream locations) of charged particles from past states (upstream locations). The results demonstrate that the generative and forecasting ability of the proposed approach is promising when tested against a variety of evaluation metrics.