The increasing cloudification and softwarization of networks foster the interplay among multiple independently managed deployments. An appealing reason for such an interplay lies in distributed Machine Learning (ML), which allows the creation of robust ML models by leveraging collective intelligence and computational power. In this paper, we study the application of the two cornerstones of distributed learning, namely Federated Learning (FL) and Knowledge Distillation (KD), on the Wi-Fi Access Point (AP) load prediction use case. The analysis conducted in this paper is done on a dataset that contains real measurements from a large Wi-Fi campus network, which we use to train the ML model under study based on different strategies. Performance evaluation includes relevant aspects for the suitability of distributed learning operation in real use cases, including the predictive performance, the associated communication overheads, or the energy consumption. In particular, we prove that distributed learning can improve the predictive accuracy centralized ML solutions by up to 93% while reducing the communication overheads and the energy cost by 80%.