Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Information Theoretic Limits of Data Shuffling for Distributed Learning

Sep 16, 2016

Mohamed Attia, Ravi Tandon

Figure 1 for Information Theoretic Limits of Data Shuffling for Distributed Learning

Figure 2 for Information Theoretic Limits of Data Shuffling for Distributed Learning

Share this with someone who'll enjoy it:

Abstract:Data shuffling is one of the fundamental building blocks for distributed learning algorithms, that increases the statistical gain for each step of the learning process. In each iteration, different shuffled data points are assigned by a central node to a distributed set of workers to perform local computations, which leads to communication bottlenecks. The focus of this paper is on formalizing and understanding the fundamental information-theoretic trade-off between storage (per worker) and the worst-case communication overhead for the data shuffling problem. We completely characterize the information theoretic trade-off for $K=2$, and $K=3$ workers, for any value of storage capacity, and show that increasing the storage across workers can reduce the communication overhead by leveraging coding. We propose a novel and systematic data delivery and storage update strategy for each data shuffle iteration, which preserves the structural properties of the storage across the workers, and aids in minimizing the communication overhead in subsequent data shuffling iterations.

* To be presented at IEEE GLOBECOM, December 2016

View paper on

Share this with someone who'll enjoy it:

Title:Information Theoretic Limits of Data Shuffling for Distributed Learning

Paper and Code