Federated learning is a communication-efficient and privacy-preserving solution to train a global model through the collaboration of multiple devices each with its own local training data set. In this paper, we consider federated learning over massive multiple-input multiple-output (MIMO) communication systems in which wireless devices train a global model with the aid of a central server equipped with a massive antenna array. One major challenge is to design a reception technique at the central sever to accurately estimate local gradient vectors sent from the wireless devices. To overcome this challenge, we propose a novel gradient-estimation algorithm that exploits the sparsity of the local gradient vectors. Inspired by the orthogonal matching pursuit algorithm in compressive sensing, the proposed algorithm iteratively finds the devices with non-zero gradient values while estimating the transmitted signal based on the linear minimum-mean-square-error (LMMSE) method. Meanwhile, the stopping criterion of the proposed algorithm is designed by deriving an analytical threshold for the estimation error of the transmitted signal. We also analyze the computational complexity reduction of the proposed algorithm over a simple LMMSE method. Simulation results demonstrate that the proposed algorithm performs very close to centralized learning, while providing a better performance-complexity tradeoff than linear beamforming methods.