Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Aug 30, 2022

Pierrick Pochelu, Serge G. Petiton, Bruno Conche

Figure 1 for An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Figure 2 for An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Figure 3 for An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Figure 4 for An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Share this with someone who'll enjoy it:

Abstract:Ensembles of Deep Neural Networks (DNNs) have achieved qualitative predictions but they are computing and memory intensive. Therefore, the demand is growing to make them answer a heavy workload of requests with available computational resources. Unlike recent initiatives on inference servers and inference frameworks, which focus on the prediction of single DNNs, we propose a new software layer to serve with flexibility and efficiency ensembles of DNNs. Our inference system is designed with several technical innovations. First, we propose a novel procedure to find a good allocation matrix between devices (CPUs or GPUs) and DNN instances. It runs successively a worst-fit to allocate DNNs into the memory devices and a greedy algorithm to optimize allocation settings and speed up the ensemble. Second, we design the inference system based on multiple processes to run asynchronously: batching, prediction, and the combination rule with an efficient internal communication scheme to avoid overhead. Experiments show the flexibility and efficiency under extreme scenarios: It successes to serve an ensemble of 12 heavy DNNs into 4 GPUs and at the opposite, one single DNN multi-threaded into 16 GPUs. It also outperforms the simple baseline consisting of optimizing the batch size of DNNs by a speedup up to 2.7X on the image classification task.

* Proceedings of IEEE International Conference on Big Data 2022

View paper on

Share this with someone who'll enjoy it:

Title:An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Paper and Code