Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

May 10, 2022

Yongji Wu, Matthew Lentz, Danyang Zhuo, Yao Lu

Figure 1 for Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

Figure 2 for Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

Figure 3 for Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

Figure 4 for Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

Share this with someone who'll enjoy it:

Abstract:With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources for machine learning inference have increasingly moved to the edge of the network. Existing machine learning inference platforms typically assume a homogeneous infrastructure and do not take into account the more complex and tiered computing infrastructure that includes edge devices, local hubs, edge datacenters, and cloud datacenters. On the other hand, recent machine learning efforts have provided viable solutions for model compression, pruning and quantization for heterogeneous environments; for a machine learning model, now we may easily find or even generate a series of models with different tradeoffs between accuracy and efficiency. We design and implement JellyBean, a framework for serving and optimizing machine learning inference workflows on heterogeneous infrastructures. Given service-level objectives (e.g., throughput, accuracy), JellyBean automatically selects the most cost-efficient models that met the accuracy target and decides how to deploy them across different tiers of infrastructures. Evaluations show that JellyBean reduces the total serving cost of visual question answering by up to 58%, and vehicle tracking from the NVIDIA AI City Challenge by up to 36% compared with state-of-the-art model selection and worker assignment solutions. JellyBean also outperforms prior ML serving systems (e.g., Spark on the cloud) up to 5x in serving costs.

View paper on

Share this with someone who'll enjoy it:

Title:Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

Paper and Code