Dynamic free-breathing fetal cardiac MRI is one of the most challenging modalities, which requires high temporal and spatial resolution to depict rapid changes in a small fetal heart. The ability of deep learning methods to recover undersampled data could help to optimise the kt-SENSE acquisition strategy and improve non-gated kt-SENSE reconstruction quality. In this work, we explore supervised deep learning networks for reconstruction of kt-SENSE style acquired data using an extensive in vivo dataset. Having access to fully-sampled low-resolution multi-coil fetal cardiac MRI, we study the performance of the networks to recover fully-sampled data from undersampled data. We consider model architectures together with training strategies taking into account their application in the real clinical setup used to collect the dataset to enable networks to recover prospectively undersampled data. We explore a set of modifications to form a baseline performance evaluation for dynamic fetal cardiac MRI on real data. We systematically evaluate the models on coil-combined data to reveal the effect of the suggested changes to the architecture in the context of fetal heart properties. We show that the best-performers recover a detailed depiction of the maternal anatomy on a large scale, but the dynamic properties of the fetal heart are under-represented. Training directly on multi-coil data improves the performance of the models, allows their prospective application to undersampled data and makes them outperform CTFNet introduced for adult cardiac cine MRI. However, these models deliver similar qualitative performances recovering the maternal body very well but underestimating the dynamic properties of fetal heart. This dynamic feature of fast change of fetal heart that is highly localised suggests both more targeted training and evaluation methods might be needed for fetal heart application.