In this paper, we consider decentralized federated learning (FL) over wireless networks, where over-the-air computation (AirComp) is adopted to facilitate the local model consensus in a device-to-device (D2D) communication manner. However, the AirComp-based consensus phase brings the additive noise in each algorithm iterate and the consensus needs to be robust to wireless network topology changes, which introduce a coupled and novel challenge of establishing the convergence for wireless decentralized FL algorithm. To facilitate consensus phase, we propose an AirComp-based DSGD with gradient tracking and variance reduction (DSGT-VR) algorithm, where both precoding and decoding strategies are developed for D2D communication. Furthermore, we prove that the proposed algorithm converges linearly and establish the optimality gap for strongly convex and smooth loss functions, taking into account the channel fading and noise. The theoretical result shows that the additional error bound in the optimality gap depends on the number of devices. Extensive simulations verify the theoretical results and show that the proposed algorithm outperforms other benchmark decentralized FL algorithms over wireless networks.