Federated learning (FL) over wireless communication channels, specifically, over-the-air (OTA) model aggregation framework is considered. In OTA wireless setups, the adverse channel effects can be alleviated by increasing the number of receive antennas at the parameter server (PS), which performs model aggregation. However, the performance of OTA FL is limited by the presence of mobile users (MUs) located far away from the PS. In this paper, to mitigate this limitation, we propose hierarchical over-the-air federated learning (HOTAFL), which utilizes intermediary servers (IS) to form clusters near MUs. We provide a convergence analysis for the proposed setup, and demonstrate through theoretical and experimental results that local aggregation in each cluster before global aggregation leads to a better performance and faster convergence than OTA FL.