One of the key challenges towards the deployment of over-the-air federated learning (AirFL) is the design of mechanisms that can comply with the power and bandwidth constraints of the shared channel, while causing minimum deterioration to the learning performance as compared to baseline noiseless implementations. For additive white Gaussian noise (AWGN) channels with instantaneous per-device power constraints, prior work has demonstrated the optimality of a power control mechanism based on norm clipping. This was done through the minimization of an upper bound on the optimality gap for smooth learning objectives satisfying the Polyak-{\L}ojasiewicz (PL) condition. In this paper, we make two contributions to the development of AirFL based on norm clipping, which we refer to as AirFL-Clip. First, we provide a convergence bound for AirFLClip that applies to general smooth and non-convex learning objectives. Unlike existing results, the derived bound is free from run-specific parameters, thus supporting an offline evaluation. Second, we extend AirFL-Clip to include Top-k sparsification and linear compression. For this generalized protocol, referred to as AirFL-Clip-Comp, we derive a convergence bound for general smooth and non-convex learning objectives. We argue, and demonstrate via experiments, that the only time-varying quantities present in the bound can be efficiently estimated offline by leveraging the well-studied properties of sparse recovery algorithms.