Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shien Zhu

FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

Jan 19, 2022

Shien Zhu, Luan H. K. Duong, Hui Chen, Di Liu, Weichen Liu

Figure 1 for FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

Figure 2 for FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

Figure 3 for FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

Figure 4 for FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks

Abstract:Convolutional Neural Networks (CNNs) demonstrate great performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to skip the null operations on zero weights. Second, we propose a fast addition scheme based on the memory Sense Amplifier to avoid the time overhead of both carry propagation and writing back the carry to the memory cells. Third, we further propose a Combined-Stationary data mapping to reduce the data movement of both activations and weights and increase the parallelism of memory columns. Simulation results show that for addition operations at the Sense Amplifier level, FAT achieves 2.00X speedup, 1.22X power efficiency and 1.22X area efficiency compared with State-Of-The-Art IMC accelerator ParaPIM. FAT achieves 10.02X speedup and 12.19X energy efficiency compared with ParaPIM on networks with 80% sparsity

* 14 pages

Via

Access Paper or Ask Questions

Cross-filter compression for CNN inference acceleration

May 18, 2020

Fuyuan Lyu, Shien Zhu, Weichen Liu

Figure 1 for Cross-filter compression for CNN inference acceleration

Figure 2 for Cross-filter compression for CNN inference acceleration

Figure 3 for Cross-filter compression for CNN inference acceleration

Figure 4 for Cross-filter compression for CNN inference acceleration

Abstract:Convolution neural network demonstrates great capability for multiple tasks, such as image classification and many others. However, much resource is required to train a network. Hence much effort has been made to accelerate neural network by reducing precision of weights, activation, and gradient. However, these filter-wise quantification methods exist a natural upper limit, caused by the size of the kernel. Meanwhile, with the popularity of small kernel, the natural limit further decrease. To address this issue, we propose a new cross-filter compression method that can provide $\sim32\times$ memory savings and $122\times$ speed up in convolution operations. In our method, all convolution filters are quantized to given bits and spatially adjacent filters share the same scaling factor. Our compression method, based on Binary-Weight and XNOR-Net separately, is evaluated on CIFAR-10 and ImageNet dataset with widely used network structures, such as ResNet and VGG, and witness tolerable accuracy loss compared to state-of-the-art quantification methods.

Via

Access Paper or Ask Questions