Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Antonio J. Peña

Enabling Homomorphically Encrypted Inference for Large DNN Models

Apr 29, 2021

Guillermo Lloret-Talavera, Marc Jorda, Harald Servat, Fabian Boemer, Chetan Chauhan, Shigeki Tomishima, Nilesh N. Shah, Antonio J. Peña

Figure 1 for Enabling Homomorphically Encrypted Inference for Large DNN Models

Figure 2 for Enabling Homomorphically Encrypted Inference for Large DNN Models

Figure 3 for Enabling Homomorphically Encrypted Inference for Large DNN Models

Figure 4 for Enabling Homomorphically Encrypted Inference for Large DNN Models

Abstract:The proliferation of machine learning services in the last few years has raised data privacy concerns. Homomorphic encryption (HE) enables inference using encrypted data but it incurs 100x-10,000x memory and runtime overheads. Secure deep neural network (DNN) inference using HE is currently limited by computing and memory resources, with frameworks requiring hundreds of gigabytes of DRAM to evaluate small models. To overcome these limitations, in this paper we explore the feasibility of leveraging hybrid memory systems comprised of DRAM and persistent memory. In particular, we explore the recently-released Intel Optane PMem technology and the Intel HE-Transformer nGraph to run large neural networks such as MobileNetV2 (in its largest variant) and ResNet-50 for the first time in the literature. We present an in-depth analysis of the efficiency of the executions with different hardware and software configurations. Our results conclude that DNN inference using HE incurs on friendly access patterns for this memory configuration, yielding efficient executions.

* Manuscript accepted for publication in IEEE Transactions on Computers

Via

Access Paper or Ask Questions

cuConv: A CUDA Implementation of Convolution for CNN Inference

Mar 30, 2021

Marc Jordà, Pedro Valero-Lara, Antonio J. Peña

Figure 1 for cuConv: A CUDA Implementation of Convolution for CNN Inference

Figure 2 for cuConv: A CUDA Implementation of Convolution for CNN Inference

Figure 3 for cuConv: A CUDA Implementation of Convolution for CNN Inference

Figure 4 for cuConv: A CUDA Implementation of Convolution for CNN Inference

Abstract:Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in production for this purpose. State-of-the-art implementations, however, present a lack of efficiency for some commonly used network configurations. In this paper we propose a GPU-based implementation of the convolution operation for CNN inference that favors coalesced accesses, without requiring prior data transformations. Our experiments demonstrate that our proposal yields notable performance improvements in a range of common CNN forward propagation convolution configurations, with speedups of up to 2.29x with respect to the best implementation of convolution in cuDNN, hence covering a relevant region in currently existing approaches.

* This work has been submitted to the Springer for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions