Federated learning (FL) is a collaborative machine learning paradigm, which enables deep learning model training over a large volume of decentralized data residing in mobile devices without accessing clients' private data. Driven by the ever increasing demand for model training of mobile applications or devices, a vast majority of FL tasks are implemented over wireless fading channels. Due to the time-varying nature of wireless channels, however, random delay occurs in both the uplink and downlink transmissions of FL. How to analyze the overall time consumption of a wireless FL task, or more specifically, a FL's delay distribution, becomes a challenging but important open problem, especially for delay-sensitive model training. In this paper, we present a unified framework to calculate the approximate delay distributions of FL over arbitrary fading channels. Specifically, saddle point approximation, extreme value theory (EVT), and large deviation theory (LDT) are jointly exploited to find the approximate delay distribution along with its tail distribution, which characterizes the quality-of-service of a wireless FL system. Simulation results will demonstrate that our approximation method achieves a small approximation error, which vanishes with the increase of training accuracy.