Federated learning (FL) is a novel learning paradigm that addresses the privacy leakage challenge of centralized learning. However, in FL, users with non-independent and identically distributed (non-IID) characteristics can deteriorate the performance of the global model. Specifically, the global model suffers from the weight divergence challenge owing to non-IID data. To address the aforementioned challenge, we propose a novel diffusion strategy of the machine learning (ML) model (FedDif) to maximize the FL performance with non-IID data. In FedDif, users spread local models to neighboring users over D2D communications. FedDif enables the local model to experience different distributions before parameter aggregation. Furthermore, we theoretically demonstrate that FedDif can circumvent the weight divergence challenge. On the theoretical basis, we propose the communication-efficient diffusion strategy of the ML model, which can determine the trade-off between the learning performance and communication cost based on auction theory. The performance evaluation results show that FedDif improves the test accuracy of the global model by 11% compared to the baseline FL with non-IID settings. Moreover, FedDif improves communication efficiency in perspective of the number of transmitted sub-frames and models by 2.77 folds than the latest methods