Abstract:Distributed computing enables large-scale computation tasks to be processed over multiple workers in parallel. However, the randomness of communication and computation delays across workers causes the straggler effect, which may degrade the performance. Coded computation helps to mitigate the straggler effect, but the amount of redundant load and their assignment to the workers should be carefully optimized. In this work, we consider a multi-master heterogeneous-worker distributed computing scenario, where multiple matrix multiplication tasks are encoded and allocated to workers for parallel computation. The goal is to minimize the communication plus computation delay of the slowest task. We propose worker assignment, resource allocation and load allocation algorithms under both dedicated and fractional worker assignment policies, where each worker can process the encoded tasks of either a single master or multiple masters, respectively. Then, the non-convex delay minimization problem is solved by employing the Markov's inequality-based approximation, Karush-Kuhn-Tucker conditions, and successive convex approximation methods. Through extensive simulations, we show that the proposed algorithms can reduce the task completion delay compared to the benchmarks, and observe that dedicated and fractional worker assignment policies have different scopes of applications.
Abstract:We consider federated edge learning (FEEL) over wireless fading channels taking into account the downlink and uplink channel latencies, and the random computation delays at the clients. We speed up the training process by overlapping the communication with computation. With fountain coded transmission of the global model update, clients receive the global model asynchronously, and start performing local computations right away. Then, we propose a dynamic client scheduling policy, called MRTP, for uploading local model updates to the parameter server (PS), which, at any time, schedules the client with the minimum remaining upload time. However, MRTP can lead to biased participation of clients in the update process, resulting in performance degradation in non-iid data scenarios. To overcome this, we propose two alternative schemes with fairness considerations, termed as age-aware MRTP (A-MRTP), and opportunistically fair MRTP (OF-MRTP). In A-MRTP, the remaining clients are scheduled according to the ratio between their remaining transmission time and the update age, while in OF-MRTP, the selection mechanism utilizes the long term average channel rate of the clients to further reduce the latency while ensuring fair participation of the clients. It is shown through numerical simulations that OF-MRTP provides significant reduction in latency without sacrificing test accuracy.