Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Venkat Dasari

Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning

Aug 04, 2022

Serge Kas Hanna, Rawad Bitar, Parimal Parag, Venkat Dasari, Salim El Rouayheb

Figure 1 for Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning

Figure 2 for Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning

Figure 3 for Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning

Figure 4 for Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning

Abstract:We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers, each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k<n$ workers before updating the model, where $k$ is a fixed parameter. The choice of the value of $k$ presents a trade-off between the runtime (i.e., convergence rate) of SGD and the error of the model. Towards optimizing the error-runtime trade-off, we investigate distributed SGD with adaptive~$k$, i.e., varying $k$ throughout the runtime of the algorithm. We first design an adaptive policy for varying $k$ that optimizes this trade-off based on an upper bound on the error as a function of the wall-clock time that we derive. Then, we propose and implement an algorithm for adaptive distributed SGD that is based on a statistical heuristic. Our results show that the adaptive version of distributed SGD can reach lower error values in less time compared to non-adaptive implementations. Moreover, the results also show that the adaptive version is communication-efficient, where the amount of communication required between the master and the workers is less than that of non-adaptive versions.

* arXiv admin note: substantial text overlap with arXiv:2002.11005

Via

Access Paper or Ask Questions

Virtuoso: Video-based Intelligence for real-time tuning on SOCs

Dec 24, 2021

Jayoung Lee, PengCheng Wang, Ran Xu, Venkat Dasari, Noah Weston, Yin Li, Saurabh Bagchi, Somali Chaterji

Figure 1 for Virtuoso: Video-based Intelligence for real-time tuning on SOCs

Figure 2 for Virtuoso: Video-based Intelligence for real-time tuning on SOCs

Figure 3 for Virtuoso: Video-based Intelligence for real-time tuning on SOCs

Figure 4 for Virtuoso: Video-based Intelligence for real-time tuning on SOCs

Abstract:Efficient and adaptive computer vision systems have been proposed to make computer vision tasks, such as image classification and object detection, optimized for embedded or mobile devices. These solutions, quite recent in their origin, focus on optimizing the model (a deep neural network, DNN) or the system by designing an adaptive system with approximation knobs. In spite of several recent efforts, we show that existing solutions suffer from two major drawbacks. First, the system does not consider energy consumption of the models while making a decision on which model to run. Second, the evaluation does not consider the practical scenario of contention on the device, due to other co-resident workloads. In this work, we propose an efficient and adaptive video object detection system, Virtuoso, which is jointly optimized for accuracy, energy efficiency, and latency. Underlying Virtuoso is a multi-branch execution kernel that is capable of running at different operating points in the accuracy-energy-latency axes, and a lightweight runtime scheduler to select the best fit execution branch to satisfy the user requirement. To fairly compare with Virtuoso, we benchmark 15 state-of-the-art or widely used protocols, including Faster R-CNN (FRCNN), YOLO v3, SSD, EfficientDet, SELSA, MEGA, REPP, FastAdapt, and our in-house adaptive variants of FRCNN+, YOLO+, SSD+, and EfficientDet+ (our variants have enhanced efficiency for mobiles). With this comprehensive benchmark, Virtuoso has shown superiority to all the above protocols, leading the accuracy frontier at every efficiency level on NVIDIA Jetson mobile GPUs. Specifically, Virtuoso has achieved an accuracy of 63.9%, which is more than 10% higher than some of the popular object detection models, FRCNN at 51.1%, and YOLO at 49.5%.

* 28 pages, 15 figures, 4 tables, ACM-TODAES

Via

Access Paper or Ask Questions

Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers

Feb 25, 2020

Serge Kas Hanna, Rawad Bitar, Parimal Parag, Venkat Dasari, Salim El Rouayheb

Figure 1 for Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers

Figure 2 for Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers

Figure 3 for Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers

Abstract:We consider the setting where a master wants to run a distributed stochastic gradient descent (SGD) algorithm on $n$ workers each having a subset of the data. Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays. One solution studied in the literature is to wait at each iteration for the responses of the fastest $k<n$ workers before updating the model, where $k$ is a fixed parameter. The choice of the value of $k$ presents a trade-off between the runtime (i.e., convergence rate) of SGD and the error of the model. Towards optimizing the error-runtime trade-off, we investigate distributed SGD with adaptive $k$. We first design an adaptive policy for varying $k$ that optimizes this trade-off based on an upper bound on the error as a function of the wall-clock time which we derive. Then, we propose an algorithm for adaptive distributed SGD that is based on a statistical heuristic. We implement our algorithm and provide numerical simulations which confirm our intuition and theoretical analysis.

* Accepted to IEEE ICASSP 2020

Via

Access Paper or Ask Questions