We present a method for learning latent stochastic differential equations (SDEs) from high dimensional time series data. Given a time series generated from a lower dimensional It\^{o} process, the proposed method uncovers the relevant parameters of the SDE through a self-supervised learning approach. Using the framework of variational autoencoders (VAEs), we consider a conditional generative model for the data based on the Euler-Maruyama approximation of SDE solutions. Furthermore, we use recent results on identifiability of semi-supervised learning to show that our model can recover not only the underlying SDE parameters, but also the original latent space, up to an isometry, in the limit of infinite data. We validate the model through a series of different simulated video processing tasks where the underlying SDE is known. Our results suggest that the proposed method effectively learns the underlying SDE, as predicted by the theory.