Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Resource Allocation for Multiuser Edge Inference with Batching and Early Exiting (Extended Version)

Apr 11, 2022

Zhiyan Liu, Qiao Lan, Kaibin Huang

Figure 1 for Resource Allocation for Multiuser Edge Inference with Batching and Early Exiting (Extended Version)

Figure 2 for Resource Allocation for Multiuser Edge Inference with Batching and Early Exiting (Extended Version)

Figure 3 for Resource Allocation for Multiuser Edge Inference with Batching and Early Exiting (Extended Version)

Figure 4 for Resource Allocation for Multiuser Edge Inference with Batching and Early Exiting (Extended Version)

Share this with someone who'll enjoy it:

Abstract:The deployment of inference services at the network edge, called edge inference, offloads computation-intensive inference tasks from mobile devices to edge servers, thereby enhancing the former's capabilities and battery lives. In a multiuser system, the joint allocation of communication-and-computation ($\text{C}^\text{2}$) resources (i.e., scheduling and bandwidth allocation) is made challenging by adopting efficient inference techniques, batching and early exiting, and further complicated by the heterogeneity in users' requirements on accuracy and latency. Batching groups multiple tasks into one batch for parallel processing to reduce time-consuming memory access and thereby boosts the throughput (i.e., completed task per second). On the other hand, early exiting allows a task to exit from a deep-neural network without traversing the whole network to support a tradeoff between accuracy and latency. In this work, we study optimal $\text{C}^\text{2}$ resource allocation with batching and early exiting, which is an NP-complete integer program. A set of efficient algorithms are designed under the criterion of maximum throughput by tackling the challenge. Experimental results demonstrate that both optimal and sub-optimal $\text{C}^\text{2}$ resource allocation algorithms can leverage integrated batching and early exiting to achieve 200% throughput gain over conventional schemes.

* This is an extended version of a submission to IEEE journal

View paper on

Share this with someone who'll enjoy it:

Title:Resource Allocation for Multiuser Edge Inference with Batching and Early Exiting (Extended Version)

Paper and Code