Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel Roy

Adaptive Gradient Quantization for Data-Parallel SGD

Oct 23, 2020

Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel Roy, Ali Ramezani-Kebrya

Figure 1 for Adaptive Gradient Quantization for Data-Parallel SGD

Figure 2 for Adaptive Gradient Quantization for Data-Parallel SGD

Figure 3 for Adaptive Gradient Quantization for Data-Parallel SGD

Figure 4 for Adaptive Gradient Quantization for Data-Parallel SGD

Abstract:Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression schemes in parallel by efficiently computing sufficient statistics of a parametric distribution. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on ImageNet in challenging low-cost communication setups. Our adaptive methods are also significantly more robust to the choice of hyperparameters.

* Accepted at the conference on Neural Information Processing Systems (NeurIPS 2020)

Via

Access Paper or Ask Questions

The Infinite Latent Events Model

May 09, 2012

David Wingate, Noah Goodman, Daniel Roy, Joshua Tenenbaum

Figure 1 for The Infinite Latent Events Model

Figure 2 for The Infinite Latent Events Model

Figure 3 for The Infinite Latent Events Model

Figure 4 for The Infinite Latent Events Model

Abstract:We present the Infinite Latent Events Model, a nonparametric hierarchical Bayesian distribution over infinite dimensional Dynamic Bayesian Networks with binary state representations and noisy-OR-like transitions. The distribution can be used to learn structure in discrete timeseries data by simultaneously inferring a set of latent events, which events fired at each timestep, and how those events are causally linked. We illustrate the model on a sound factorization task, a network topology identification task, and a video game task.

* Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

Via

Access Paper or Ask Questions

Bayesian Agglomerative Clustering with Coalescents

Jul 04, 2009

Yee Whye Teh, Hal Daumé III, Daniel Roy

Figure 1 for Bayesian Agglomerative Clustering with Coalescents

Figure 2 for Bayesian Agglomerative Clustering with Coalescents

Figure 3 for Bayesian Agglomerative Clustering with Coalescents

Figure 4 for Bayesian Agglomerative Clustering with Coalescents

Abstract:We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman's coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over others, and demonstrate our approach in document clustering and phylolinguistics.

* NIPS 2008

Via

Access Paper or Ask Questions