In the field of behavior-related brain computation, it is necessary to meaningfully align raw neural population activities against the drastic shift between them. However, the alignment is non-trivial since most neural population activities are in a multivariate time-series manner. An instrumental framework within neuroscience research posits that trial-based neural population activities rely on low-dimensional latent dynamics. Focusing on such latent dynamics greatly facilitates the alignment procedure. Despite the considerable progress we have reached, existing methods usually ignore the intrinsic spatio-temporal structures within latent dynamics. Thus, those solutions lead to poor quality in dynamics structures and overall performance after alignment. To tackle this problem, we propose a method leveraging the expressiveness of diffusion model to relieve such issues. Specifically, the latent dynamics structures of the source domain are first extracted by the diffusion model. Then, such structures are well-recovered through a maximum likelihood alignment procedure on the target domain. We first demonstrate the effectiveness of our proposed method on a synthetic dataset. Then, when applied to neural recordings from primate motor cortex, under both cross-day and inter-subject settings, our method consistently manifests its capability of preserving the spatio-temporal structure of latent dynamics and outperforms existing approaches in alignment quality.