Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marius Stan

H2PIPE: High throughput CNN Inference on FPGAs with High-Bandwidth Memory

Aug 17, 2024

Mario Doumet, Marius Stan, Mathew Hall, Vaughn Betz

Abstract:Convolutional Neural Networks (CNNs) combine large amounts of parallelizable computation with frequent memory access. Field Programmable Gate Arrays (FPGAs) can achieve low latency and high throughput CNN inference by implementing dataflow accelerators that pipeline layer-specific hardware to implement an entire network. By implementing a different processing element for each CNN layer, these layer-pipelined accelerators can achieve high compute density, but having all layers processing in parallel requires high memory bandwidth. Traditionally this has been satisfied by storing all weights on chip, but this is infeasible for the largest CNNs, which are often those most in need of acceleration. In this work we augment a state-of-the-art dataflow accelerator (HPIPE) to leverage both High-Bandwidth Memory (HBM) and on-chip storage, enabling high performance layer-pipelined dataflow acceleration of large CNNs. Based on profiling results of HBM's latency and throughput against expected address patterns, we develop an algorithm to choose which weight buffers should be moved off chip and how deep the on-chip FIFOs to HBM should be to minimize compute unit stalling. We integrate the new hardware generation within the HPIPE domain-specific CNN compiler and demonstrate good bandwidth efficiency against theoretical limits. Compared to the best prior work we obtain speed-ups of at least 19.4x, 5.1x and 10.5x on ResNet-18, ResNet-50 and VGG-16 respectively.

Via

Access Paper or Ask Questions

Towards Online Steering of Flame Spray Pyrolysis Nanoparticle Synthesis

Oct 16, 2020

Maksim Levental, Ryan Chard, Joseph A. Libera, Kyle Chard, Aarthi Koripelly, Jakob R. Elias, Marcus Schwarting, Ben Blaiszik, Marius Stan, Santanu Chaudhuri(+1 more)

Figure 1 for Towards Online Steering of Flame Spray Pyrolysis Nanoparticle Synthesis

Figure 2 for Towards Online Steering of Flame Spray Pyrolysis Nanoparticle Synthesis

Figure 3 for Towards Online Steering of Flame Spray Pyrolysis Nanoparticle Synthesis

Figure 4 for Towards Online Steering of Flame Spray Pyrolysis Nanoparticle Synthesis

Abstract:Flame Spray Pyrolysis (FSP) is a manufacturing technique to mass produce engineered nanoparticles for applications in catalysis, energy materials, composites, and more. FSP instruments are highly dependent on a number of adjustable parameters, including fuel injection rate, fuel-oxygen mixtures, and temperature, which can greatly affect the quality, quantity, and properties of the yielded nanoparticles. Optimizing FSP synthesis requires monitoring, analyzing, characterizing, and modifying experimental conditions.Here, we propose a hybrid CPU-GPU Difference of Gaussians (DoG)method for characterizing the volume distribution of unburnt solution, so as to enable near-real-time optimization and steering of FSP experiments. Comparisons against standard implementations show our method to be an order of magnitude more efficient. This surrogate signal can be deployed as a component of an online end-to-end pipeline that maximizes the synthesis yield.

Via

Access Paper or Ask Questions

An Experimentally Driven Automated Machine Learned lnter-Atomic Potential for a Refractory Oxide

Sep 09, 2020

Ganesh Sivaraman, Leighanne Gallington, Anand Narayanan Krishnamoorthy, Marius Stan, Gabor Csanyi, Alvaro Vazquez-Mayagoitia, Chris J. Benmore

$Figure 1 for An Experimentally Driven Automated Machine Learned lnter-Atomic Potential for a Refractory Oxide$

$Figure 2 for An Experimentally Driven Automated Machine Learned lnter-Atomic Potential for a Refractory Oxide$

$Figure 3 for An Experimentally Driven Automated Machine Learned lnter-Atomic Potential for a Refractory Oxide$

$Figure 4 for An Experimentally Driven Automated Machine Learned lnter-Atomic Potential for a Refractory Oxide$

Abstract:Understanding the structure and properties of refractory oxides are critical for high temperature applications. In this work, a combined experimental and simulation approach uses an automated closed loop via an active-learner, which is initialized by X-ray and neutron diffraction measurements, and sequentially improves a machine-learning model until the experimentally predetermined phase space is covered. A multi-phase potential is generated for a canonical example of the archetypal refractory oxide, HfO2, by drawing a minimum number of training configurations from room temperature to the liquid state at ~2900oC. The method significantly reduces model development time and human effort.

Via

Access Paper or Ask Questions

Machine Learning Inter-Atomic Potentials Generation Driven by Active Learning: A Case Study for Amorphous and Liquid Hafnium dioxide

Oct 22, 2019

Ganesh Sivaraman, Anand Narayanan Krishnamoorthy, Matthias Baur, Christian Holm, Marius Stan, Gabor Csányi, Chris Benmore, Álvaro Vázquez-Mayagoitia

Figure 1 for Machine Learning Inter-Atomic Potentials Generation Driven by Active Learning: A Case Study for Amorphous and Liquid Hafnium dioxide

Figure 2 for Machine Learning Inter-Atomic Potentials Generation Driven by Active Learning: A Case Study for Amorphous and Liquid Hafnium dioxide

Figure 3 for Machine Learning Inter-Atomic Potentials Generation Driven by Active Learning: A Case Study for Amorphous and Liquid Hafnium dioxide

Figure 4 for Machine Learning Inter-Atomic Potentials Generation Driven by Active Learning: A Case Study for Amorphous and Liquid Hafnium dioxide

Abstract:We propose a novel active learning scheme for automatically sampling a minimum number of uncorrelated configurations for fitting the Gaussian Approximation Potential (GAP). Our active learning scheme consists of an unsupervised machine learning (ML) scheme coupled to Bayesian optimization technique that evaluates the GAP model. We apply this scheme to a Hafnium dioxide (HfO2) dataset generated from a melt-quench ab initio molecular dynamics (AIMD) protocol. Our results show that the active learning scheme, with no prior knowledge of the dataset is able to extract a configuration that reaches the required energy fit tolerance. Further, molecular dynamics (MD) simulations performed using this active learned GAP model on 6144-atom systems of amorphous and liquid state elucidate the structural properties of HfO2 with near ab initio precision and quench rates (i.e. 1.0 K/ps) not accessible via AIMD. The melt and amorphous x-ray structural factors generated from our simulation are in good agreement with experiment. Additionally, the calculated diffusion constants are in good agreement with previous ab initio studies.

* to be submitted NPJ Computational Materials

Via

Access Paper or Ask Questions