Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shurui Li

Training Neural Networks for Execution on Approximate Hardware

Apr 08, 2023

Tianmu Li, Shurui Li, Puneet Gupta

Figure 1 for Training Neural Networks for Execution on Approximate Hardware

Figure 2 for Training Neural Networks for Execution on Approximate Hardware

Figure 3 for Training Neural Networks for Execution on Approximate Hardware

Figure 4 for Training Neural Networks for Execution on Approximate Hardware

Abstract:Approximate computing methods have shown great potential for deep learning. Due to the reduced hardware costs, these methods are especially suitable for inference tasks on battery-operated devices that are constrained by their power budget. However, approximate computing hasn't reached its full potential due to the lack of work on training methods. In this work, we discuss training methods for approximate hardware. We demonstrate how training needs to be specialized for approximate hardware, and propose methods to speed up the training process by up to 18X.

Via

Access Paper or Ask Questions

PhotoFourier: A Photonic Joint Transform Correlator-Based Neural Network Accelerator

Nov 10, 2022

Shurui Li, Hangbo Yang, Chee Wei Wong, Volker J. Sorger, Puneet Gupta

Abstract:The last few years have seen a lot of work to address the challenge of low-latency and high-throughput convolutional neural network inference. Integrated photonics has the potential to dramatically accelerate neural networks because of its low-latency nature. Combined with the concept of Joint Transform Correlator (JTC), the computationally expensive convolution functions can be computed instantaneously (time of flight of light) with almost no cost. This 'free' convolution computation provides the theoretical basis of the proposed PhotoFourier JTC-based CNN accelerator. PhotoFourier addresses a myriad of challenges posed by on-chip photonic computing in the Fourier domain including 1D lenses and high-cost optoelectronic conversions. The proposed PhotoFourier accelerator achieves more than 28X better energy-delay product compared to state-of-art photonic neural network accelerators.

* 12 pages, 13 figures, accepted in HPCA 2023

Via

Access Paper or Ask Questions

Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors

Jan 25, 2022

Shurui Li, Puneet Gupta

Figure 1 for Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors

Figure 2 for Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors

Figure 3 for Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors

Figure 4 for Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors

Abstract:Applications of neural networks on edge systems have proliferated in recent years but the ever-increasing model size makes neural networks not able to deploy on resource-constrained microcontrollers efficiently. We propose bit-serial weight pools, an end-to-end framework that includes network compression and acceleration of arbitrary sub-byte precision. The framework can achieve up to 8x compression compared to 8-bit networks by sharing a pool of weights across the entire network. We further propose a bit-serial lookup based software implementation that allows runtime-bitwidth tradeoff and is able to achieve more than 2.8x speedup and 7.5x storage compression compared to 8-bit weight pool networks, with less than 1% accuracy drop.

* 10 pages, 8 figures, accepted in the 5th MLSys conference

Via

Access Paper or Ask Questions

Batch Processing and Data Streaming Fourier-based Convolutional Neural Network Accelerator

Dec 23, 2021

Zibo Hu, Shurui Li, Russell L. T. Schwartz, Maria Solyanik-Gorgone, Mario Miscuglio, Puneet Gupta, Volker J. Sorger

Figure 1 for Batch Processing and Data Streaming Fourier-based Convolutional Neural Network Accelerator

Figure 2 for Batch Processing and Data Streaming Fourier-based Convolutional Neural Network Accelerator

Figure 3 for Batch Processing and Data Streaming Fourier-based Convolutional Neural Network Accelerator

Figure 4 for Batch Processing and Data Streaming Fourier-based Convolutional Neural Network Accelerator

Abstract:Decision-making by artificial neural networks with minimal latency is paramount for numerous applications such as navigation, tracking, and real-time machine action systems. This requires the machine learning hardware to handle multidimensional data with a high throughput. Processing convolution operations being the major computational tool for data classification tasks, unfortunately, follows a challenging run-time complexity scaling law. However, implementing the convolution theorem homomorphically in a Fourier-optic display-light-processor enables a non-iterative O(1) runtime complexity for data inputs beyond 1,000 x 1,000 large matrices. Following this approach, here we demonstrate data streaming multi-kernel image batch-processing with a Fourier Convolutional Neural Network (FCNN) accelerator. We show image batch processing of large-scale matrices as passive 2-million dot-product multiplications performed by digital light-processing modules in the Fourier domain. In addition, we parallelize this optical FCNN system further by utilizing multiple spatio-parallel diffraction orders, thus achieving a 98-times throughput improvement over state-of-art FCNN accelerators. The comprehensive discussion of the practical challenges related to working on the edge of the system's capabilities highlights issues of crosstalk in the Fourier domain and resolution scaling laws. Accelerating convolutions by utilizing the massive parallelism in display technology brings forth a non-van Neuman-based machine learning acceleration.

* 13 pages, 4 figures

Via

Access Paper or Ask Questions

Revisiting the double-well problem by deep learning with a hybrid network

Apr 25, 2021

Shurui Li, Jianqin Xu, Jing Qian

Figure 1 for Revisiting the double-well problem by deep learning with a hybrid network

Figure 2 for Revisiting the double-well problem by deep learning with a hybrid network

Figure 3 for Revisiting the double-well problem by deep learning with a hybrid network

Figure 4 for Revisiting the double-well problem by deep learning with a hybrid network

Abstract:Solving physical problems by deep learning is accurate and efficient mainly accounting for the use of an elaborate neural network. We propose a novel hybrid network which integrates two different kinds of neural networks: LSTM and ResNet, in order to overcome the difficulty met in solving strongly-oscillating dynamics of the system's time evolution. By taking the double-well model as an example we show that our new method can benefit from a pre-learning and verification of the periodicity of frequency by using the LSTM network, simultaneously making a high-fidelity prediction about the whole dynamics of system with ResNet, which is impossibly achieved in the case of single network. Such a hybrid network can be applied for solving cooperative dynamics in a system with fast spatial or temporal modulations, promising for realistic oscillation calculations under experimental conditions.

* 11 pages, 6 figures

Via

Access Paper or Ask Questions

SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

Mar 03, 2021

Shurui Li, Wojciech Romaszkan, Alexander Graening, Puneet Gupta

Figure 1 for SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

Figure 2 for SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

Figure 3 for SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

Figure 4 for SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration

Abstract:Quantization is spearheading the increase in performance and efficiency of neural network computing systems making headway into commodity hardware. We present SWIS - Shared Weight bIt Sparsity, a quantization framework for efficient neural network inference acceleration delivering improved performance and storage compression through an offline weight decomposition and scheduling algorithm. SWIS can achieve up to 54.3% (19.8%) point accuracy improvement compared to weight truncation when quantizing MobileNet-v2 to 4 (2) bits post-training (with retraining) showing the strength of leveraging shared bit-sparsity in weights. SWIS accelerator gives up to 6x speedup and 1.9x energy improvement overstate of the art bit-serial architectures.

* 8 pages, 6 figures, accepted as a full-length paper at the 2021 TinyML Research Symposium (https://openreview.net/group?id=tinyml.org/tinyML/2021/Research_Symposium)

Via

Access Paper or Ask Questions

You May Not Need Order in Time Series Forecasting

Oct 21, 2019

Yunkai Zhang, Qiao Jiang, Shurui Li, Xiaoyong Jin, Xueying Ma, Xifeng Yan

Figure 1 for You May Not Need Order in Time Series Forecasting

Figure 2 for You May Not Need Order in Time Series Forecasting

Figure 3 for You May Not Need Order in Time Series Forecasting

Figure 4 for You May Not Need Order in Time Series Forecasting

Abstract:Time series forecasting with limited data is a challenging yet critical task. While transformers have achieved outstanding performances in time series forecasting, they often require many training samples due to the large number of trainable parameters. In this paper, we propose a training technique for transformers that prepares the training windows through random sampling. As input time steps need not be consecutive, the number of distinct samples increases from linearly to combinatorially many. By breaking the temporal order, this technique also helps transformers to capture dependencies among time steps in finer granularity. We achieve competitive results compared to the state-of-the-art on real-world datasets.

Via

Access Paper or Ask Questions