Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Soojeong Kim

Accelerating Multi-Model Inference by Merging DNNs of Different Weights

Sep 28, 2020

Joo Seong Jeong, Soojeong Kim, Gyeong-In Yu, Yunseong Lee, Byung-Gon Chun

Figure 1 for Accelerating Multi-Model Inference by Merging DNNs of Different Weights

Figure 2 for Accelerating Multi-Model Inference by Merging DNNs of Different Weights

Figure 3 for Accelerating Multi-Model Inference by Merging DNNs of Different Weights

Figure 4 for Accelerating Multi-Model Inference by Merging DNNs of Different Weights

Abstract:Standardized DNN models that have been proved to perform well on machine learning tasks are widely used and often adopted as-is to solve downstream tasks, forming the transfer learning paradigm. However, when serving multiple instances of such DNN models from a cluster of GPU servers, existing techniques to improve GPU utilization such as batching are inapplicable because models often do not share weights due to fine-tuning. We propose NetFuse, a technique of merging multiple DNN models that share the same architecture but have different weights and different inputs. NetFuse is made possible by replacing operations with more general counterparts that allow a set of weights to be associated with only a certain set of inputs. Experiments on ResNet-50, ResNeXt-50, BERT, and XLNet show that NetFuse can speed up DNN inference time up to 3.6x on a NVIDIA V100 GPU, and up to 3.0x on a TITAN Xp GPU when merging 32 model instances, while only using up a small additional amount of GPU memory.

Via

Access Paper or Ask Questions

Improving the Expressiveness of Deep Learning Frameworks with Recursion

Sep 04, 2018

Eunji Jeong, Joo Seong Jeong, Soojeong Kim, Gyeong-In Yu, Byung-Gon Chun

Figure 1 for Improving the Expressiveness of Deep Learning Frameworks with Recursion

Figure 2 for Improving the Expressiveness of Deep Learning Frameworks with Recursion

Figure 3 for Improving the Expressiveness of Deep Learning Frameworks with Recursion

Figure 4 for Improving the Expressiveness of Deep Learning Frameworks with Recursion

Abstract:Recursive neural networks have widely been used by researchers to handle applications with recursively or hierarchically structured data. However, embedded control flow deep learning frameworks such as TensorFlow, Theano, Caffe2, and MXNet fail to efficiently represent and execute such neural networks, due to lack of support for recursion. In this paper, we add recursion to the programming model of existing frameworks by complementing their design with recursive execution of dataflow graphs as well as additional APIs for recursive definitions. Unlike iterative implementations, which can only understand the topological index of each node in recursive data structures, our recursive implementation is able to exploit the recursive relationships between nodes for efficient execution based on parallel computation. We present an implementation on TensorFlow and evaluation results with various recursive neural network models, showing that our recursive implementation not only conveys the recursive nature of recursive neural networks better than other implementations, but also uses given resources more effectively to reduce training and inference time.

* EuroSys 2018: Thirteenth EuroSys Conference, April 23-26, 2018, Porto, Portugal
* Appeared in EuroSys 2018. 13 pages, 11 figures

Via

Access Paper or Ask Questions