This paper presents a novel approach to conduct highly efficient federated learning (FL) over a massive wireless edge network, where an edge server and numerous mobile devices (clients) jointly learn a global model without transporting the huge amount of data collected by the mobile devices to the edge server. The proposed FL approach is referred to as spatio-temporal FL (STFL), which jointly exploits the spatial and temporal correlations between the learning updates from different mobile devices scheduled to join STFL in various training epochs. The STFL model not only represents the realistic intermittent learning behavior from the edge server to the mobile devices due to data delivery outage, but also features a mechanism of compensating loss learning updates in order to mitigate the impacts of intermittent learning. An analytical framework of STFL is proposed and employed to study the learning capability of STFL via its convergence performance. In particular, we have assessed the impact of data delivery outage, intermittent learning mitigation, and statistical heterogeneity of datasets on the convergence performance of STFL. The results provide crucial insights into the design and analysis of STFL based wireless networks.