Abstract:The emerging machine learning paradigm of decentralized federated learning (DFL) has the promise of greatly boosting the deployment of artificial intelligence (AI) by directly learning across distributed agents without centralized coordination. Despite significant efforts on improving the communication efficiency of DFL, most existing solutions were based on the simplistic assumption that neighboring agents are physically adjacent in the underlying communication network, which fails to correctly capture the communication cost when learning over a general bandwidth-limited network, as encountered in many edge networks. In this work, we address this gap by leveraging recent advances in network tomography to jointly design the communication demands and the communication schedule for overlay-based DFL in bandwidth-limited networks without requiring explicit cooperation from the underlying network. By carefully analyzing the structure of our problem, we decompose it into a series of optimization problems that can each be solved efficiently, to collectively minimize the total training time. Extensive data-driven simulations show that our solution can significantly accelerate DFL in comparison with state-of-the-art designs.