Federated learning (FL) is an emerging machine learning paradigm for training models across multiple edge devices holding local data sets, without explicitly exchanging the data. Recently, over-the-air (OTA) FL has been suggested to reduce the bandwidth and energy consumption, by allowing the users to transmit their data simultaneously over a Multiple Access Channel (MAC). However, this approach results in channel noise directly affecting the optimization procedure, which may degrade the accuracy of the trained model. In this paper we jointly exploit the prior distribution of local weights and the channel distribution, and develop an OTA FL algorithm based on a Bayesian approach for signal aggregation. Our proposed algorithm, dubbed Bayesian Air Aggregation Federated learning (BAAF), is shown to effectively mitigate noise and fading effects induced by the channel. To handle statistical heterogeneity of users data, which is a second major challenge in FL, we extend BAAF to allow for appropriate local updates by the users and develop the Controlled Bayesian Air Aggregation Federated-learning (COBAAF) algorithm. In addition to using a Bayesian approach to average the channel output, COBAAF controls the drift in local updates using a judicious design of correction terms. We analyze the convergence of the learned global model using BAAF and COBAAF in noisy and heterogeneous environment, showing their ability to achieve a convergence rate similar to that achieved over error-free channels. Simulation results demonstrate the improved convergence of BAAF and COBAAF over existing algorithms in machine learning tasks.