Fetal magnetic resonance imaging (MRI) is challenged by uncontrollable, large, and irregular fetal movements. Fetal MRI is performed in a fully interactive manner in which a technologist monitors motion to prescribe slices in right angles with respect to the anatomy of interest. Current practice involves repeated acquisitions to ensure diagnostic-quality images are acquired; and the scans are retrospectively registered slice-by-slice to reconstruct 3D images. Nonetheless, manual monitoring of 3D fetal motion based on displayed 2D slices and navigation at the level of stacks-of-slices (instead of slices) is sub-optimal and inefficient. The current process is highly operator-dependent, requires extensive training, and significantly increases the length of fetal MRI scans which makes them difficult for pregnant women, and costly. With that motivation, we presented a new real-time image-based motion tracking technique in MRI using deep learning that can significantly improve state of the art. Through a combination of spatial and temporal encoder-decoder networks, our system learns to predict 3D pose of the fetal head based on dynamics of motion inferred directly from sequences of acquired slices. Compared to recent works that estimate static 3D pose of the subject from slices, our method learns to predict dynamics of 3D motion. We compared our trained network on held-out test sets (including data with different characteristics, e.g. different age ranges, and motion trajectories recorded from volunteer subjects) with networks designed for estimation as well as methods adopted to make predictions. The results of all estimation and prediction tasks show that we achieved reliable motion tracking in fetal MRI. This technique can be augmented with deep learning based fast anatomy detection, segmentation, and image registration techniques to build real-time motion tracking and navigation systems.