Abstract:In this paper, we introduce SpaER, a pioneering method for fetal motion tracking that leverages equivariant filters and self-attention mechanisms to effectively learn spatio-temporal representations. Different from conventional approaches that statically estimate fetal brain motions from pairs of images, our method dynamically tracks the rigid movement patterns of the fetal head across temporal and spatial dimensions. Specifically, we first develop an equivariant neural network that efficiently learns rigid motion sequences through low-dimensional spatial representations of images. Subsequently, we learn spatio-temporal representations by incorporating time encoding and self-attention neural network layers. This approach allows for the capture of long-term dependencies of fetal brain motion and addresses alignment errors due to contrast changes and severe motion artifacts. Our model also provides a geometric deformation estimation that properly addresses image distortions among all time frames. To the best of our knowledge, our approach is the first to learn spatial-temporal representations via deep neural networks for fetal motion tracking without data augmentation. We validated our model using real fetal echo-planar images with simulated and real motions. Our method carries significant potential value in accurately measuring, tracking, and correcting fetal motion in fetal MRI sequences.