Over-the-air federated learning (OTA-FL) is an emerging technique to reduce the computation and communication overload at the PS caused by the orthogonal transmissions of the model updates in conventional federated learning (FL). This reduction is achieved at the expense of introducing aggregation error that can be efficiently suppressed by means of receive beamforming via large array-antennas. This paper studies OTA-FL in massive multiple-input multiple-output (MIMO) systems by considering a realistic scenario in which the edge server, despite its large antenna array, is restricted in the number of radio frequency (RF)-chains. For this setting, the beamforming for over-the-air model aggregation needs to be addressed jointly with antenna selection. This leads to an NP-hard problem due to the combinatorial nature of the optimization. We tackle this problem via two different approaches. In the first approach, we use the penalty dual decomposition (PDD) technique to develop a two-tier algorithm for joint antenna selection and beamforming. The second approach interprets the antenna selection task as a sparse recovery problem and develops two iterative joint algorithms based on the Lasso and fast iterative soft-thresholding methods. Convergence and complexity analysis is presented for all the schemes. The numerical investigations depict that the algorithms based on the sparse recovery techniques outperform the PDD-based algorithm, when the number of RF-chains at the edge server is much smaller than its array size. However, as the number of RF-chains increases, the PDD approach starts to be superior. Our simulations further depict that learning performance with all the antennas being active at the PS can be closely tracked by selecting less than 20% of the antennas at the PS.