A spatiotemporal deep learning framework is proposed that is capable of 2D full-field prediction of fracture in concrete mesostructures. This framework not only predicts fractures but also captures the entire history of the fracture process, from the crack initiation in the interfacial transition zone to the subsequent propagation of the cracks in the mortar matrix. In addition, a convolutional neural network is developed which can predict the averaged stress-strain curve of the mesostructures. The UNet modeling framework, which comprises an encoder-decoder section with skip connections, is used as the deep learning surrogate model. Training and test data are generated from high-fidelity fracture simulations of randomly generated concrete mesostructures. These mesostructures include geometric variabilities such as different aggregate particle geometrical features, spatial distribution, and the total volume fraction of aggregates. The fracture simulations are carried out in Abaqus, utilizing the cohesive phase-field fracture modeling technique as the fracture modeling approach. In this work, to reduce the number of training datasets, the spatial distribution of three sets of material properties for three-phase concrete mesostructures, along with the spatial phase-field damage index, are fed to the UNet to predict the corresponding stress and spatial damage index at the subsequent step. It is shown that after the training process using this methodology, the UNet model is capable of accurately predicting damage on the unseen test dataset by using 470 datasets. Moreover, another novel aspect of this work is the conversion of irregular finite element data into regular grids using a developed pipeline. This approach allows for the implementation of less complex UNet architecture and facilitates the integration of phase-field fracture equations into surrogate models for future developments.