Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sioni Summers

Machine Learning on Heterogeneous, Edge, and Quantum Hardware for Particle Physics (ML-HEQUPP)

Feb 24, 2026

Julia Gonski, Jenni Ott, Shiva Abbaszadeh, Sagar Addepalli, Matteo Cremonesi, Jennet Dickinson, Giuseppe Di Guglielmo, Erdem Yigit Ertorer, Lindsey Gray, Ryan Herbst(+109 more)

Abstract:The next generation of particle physics experiments will face a new era of challenges in data acquisition, due to unprecedented data rates and volumes along with extreme environments and operational constraints. Harnessing this data for scientific discovery demands real-time inference and decision-making, intelligent data reduction, and efficient processing architectures beyond current capabilities. Crucial to the success of this experimental paradigm are several emerging technologies, such as artificial intelligence and machine learning (AI/ML) and silicon microelectronics, and the advent of quantum algorithms and processing. Their intersection includes areas of research such as low-power and low-latency devices for edge computing, heterogeneous accelerator systems, reconfigurable hardware, novel codesign and synthesis strategies, readout for cryogenic or high-radiation environments, and analog computing. This white paper presents a community-driven vision to identify and prioritize research and development opportunities in hardware-based ML systems and corresponding physics applications, contributing towards a successful transition to the new data frontier of fundamental science.

* 125 pages, 51 figures

Via

Access Paper or Ask Questions

Sets are all you need: Ultrafast jet classification on FPGAs for HL-LHC

Feb 02, 2024

Patrick Odagiu, Zhiqiang Que, Javier Duarte, Johannes Haller, Gregor Kasieczka, Artur Lobanov, Vladimir Loncar, Wayne Luk, Jennifer Ngadiuba, Maurizio Pierini(+6 more)

Figure 1 for Sets are all you need: Ultrafast jet classification on FPGAs for HL-LHC

Figure 2 for Sets are all you need: Ultrafast jet classification on FPGAs for HL-LHC

Figure 3 for Sets are all you need: Ultrafast jet classification on FPGAs for HL-LHC

Figure 4 for Sets are all you need: Ultrafast jet classification on FPGAs for HL-LHC

Abstract:We study various machine learning based algorithms for performing accurate jet flavor classification on field-programmable gate arrays and demonstrate how latency and resource consumption scale with the input size and choice of algorithm. These architectures provide an initial design for models that could be used for tagging at the CERN LHC during its high-luminosity phase. The high-luminosity upgrade will lead to a five-fold increase in its instantaneous luminosity for proton-proton collisions and, in turn, higher data volume and complexity, such as the availability of jet constituents. Through quantization-aware training and efficient hardware implementations, we show that O(100) ns inference of complex architectures such as deep sets and interaction networks is feasible at a low computational resource cost.

* 13 pages, 3 figures, 3 tables

Via

Access Paper or Ask Questions

Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Jul 01, 2022

Elham E Khoda, Dylan Rankin, Rafael Teixeira de Lima, Philip Harris, Scott Hauck, Shih-Chieh Hsu, Michael Kagan, Vladimir Loncar, Chaitanya Paikara, Richa Rao(+3 more)

Figure 1 for Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Figure 2 for Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Figure 3 for Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Figure 4 for Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Abstract:Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neural network layers -- long short-term memory and gated recurrent unit -- within the hls4ml framework. We demonstrate that our implementation is capable of producing effective designs for both small and large models, and can be customized to meet specific design requirements for inference latencies and FPGA resources. We show the performance and synthesized designs for multiple neural networks, many of which are trained specifically for jet identification tasks at the CERN Large Hadron Collider.

* 12 pages, 6 figures, 5 tables

Via

Access Paper or Ask Questions

QONNX: Representing Arbitrary-Precision Quantized Neural Networks

Jun 17, 2022

Alessandro Pappalardo, Yaman Umuroglu, Michaela Blott, Jovan Mitrevski, Ben Hawks, Nhan Tran, Vladimir Loncar, Sioni Summers, Hendrik Borras, Jules Muhizi(+4 more)

Figure 1 for QONNX: Representing Arbitrary-Precision Quantized Neural Networks

Figure 2 for QONNX: Representing Arbitrary-Precision Quantized Neural Networks

Figure 3 for QONNX: Representing Arbitrary-Precision Quantized Neural Networks

Figure 4 for QONNX: Representing Arbitrary-Precision Quantized Neural Networks

Abstract:We present extensions to the Open Neural Network Exchange (ONNX) intermediate representation format to represent arbitrary-precision quantized neural networks. We first introduce support for low precision quantization in existing ONNX-based quantization formats by leveraging integer clipping, resulting in two new backward-compatible variants: the quantized operator format with clipping and quantize-clip-dequantize (QCDQ) format. We then introduce a novel higher-level ONNX format called quantized ONNX (QONNX) that introduces three new operators -- Quant, BipolarQuant, and Trunc -- in order to represent uniform quantization. By keeping the QONNX IR high-level and flexible, we enable targeting a wider variety of platforms. We also present utilities for working with QONNX, as well as examples of its usage in the FINN and hls4ml toolchains. Finally, we introduce the QONNX model zoo to share low-precision quantized neural networks.

* 9 pages, 5 figures, Contribution to 4th Workshop on Accelerated Machine Learning (AccML) at HiPEAC 2022 Conference

Via

Access Paper or Ask Questions

Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml

May 16, 2022

Nicolò Ghielmetti, Vladimir Loncar, Maurizio Pierini, Marcel Roed, Sioni Summers, Thea Aarrestad, Christoffer Petersson, Hampus Linander, Jennifer Ngadiuba, Kelvin Lin(+1 more)

Figure 1 for Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml

Figure 2 for Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml

Figure 3 for Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml

Figure 4 for Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml

Abstract:In this paper, we investigate how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant for autonomous driving. Considering compressed versions of the ENet convolutional neural network architecture, we demonstrate a fully-on-chip deployment with a latency of 4.9 ms per image, using less than 30% of the available resources on a Xilinx ZCU102 evaluation board. The latency is reduced to 3 ms per image when increasing the batch size to ten, corresponding to the use case where the autonomous vehicle receives inputs from multiple cameras simultaneously. We show, through aggressive filter reduction and heterogeneous quantization-aware training, and an optimized implementation of convolutional layers, that the power consumption and resource utilization can be significantly reduced while maintaining accuracy on the Cityscapes dataset.

* 11 pages, 6 tables, 5 figures

Via

Access Paper or Ask Questions

Lightweight Jet Reconstruction and Identification as an Object Detection Task

Feb 09, 2022

Adrian Alan Pol, Thea Aarrestad, Ekaterina Govorkova, Roi Halily, Anat Klempner, Tal Kopetz, Vladimir Loncar, Jennifer Ngadiuba, Maurizio Pierini, Olya Sirkin(+1 more)

Figure 1 for Lightweight Jet Reconstruction and Identification as an Object Detection Task

Figure 2 for Lightweight Jet Reconstruction and Identification as an Object Detection Task

Figure 3 for Lightweight Jet Reconstruction and Identification as an Object Detection Task

Figure 4 for Lightweight Jet Reconstruction and Identification as an Object Detection Task

Abstract:We apply object detection techniques based on deep convolutional blocks to end-to-end jet identification and reconstruction tasks encountered at the CERN Large Hadron Collider (LHC). Collision events produced at the LHC and represented as an image composed of calorimeter and tracker cells are given as an input to a Single Shot Detection network. The algorithm, named PFJet-SSD performs simultaneous localization, classification and regression tasks to cluster jets and reconstruct their features. This all-in-one single feed-forward pass gives advantages in terms of execution time and an improved accuracy w.r.t. traditional rule-based methods. A further gain is obtained from network slimming, homogeneous quantization, and optimized runtime for meeting memory and latency constraints of a typical real-time processing environment. We experiment with 8-bit and ternary quantization, benchmarking their accuracy and inference latency against a single-precision floating-point. We show that the ternary network closely matches the performance of its full-precision equivalent and outperforms the state-of-the-art rule-based algorithm. Finally, we report the inference latency on different hardware platforms and discuss future applications.

Via

Access Paper or Ask Questions

Applications and Techniques for Fast Machine Learning in Science

Oct 25, 2021

Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer(+77 more)

Figure 1 for Applications and Techniques for Fast Machine Learning in Science

Figure 2 for Applications and Techniques for Fast Machine Learning in Science

Figure 3 for Applications and Techniques for Fast Machine Learning in Science

Figure 4 for Applications and Techniques for Fast Machine Learning in Science

Abstract:In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.

* 66 pages, 13 figures, 5 tables

Via

Access Paper or Ask Questions

Accelerating Recurrent Neural Networks for Gravitational Wave Experiments

Jun 26, 2021

Zhiqiang Que, Erwei Wang, Umar Marikar, Eric Moreno, Jennifer Ngadiuba, Hamza Javed, Bartłomiej Borzyszkowski, Thea Aarrestad, Vladimir Loncar, Sioni Summers(+3 more)

Figure 1 for Accelerating Recurrent Neural Networks for Gravitational Wave Experiments

Figure 2 for Accelerating Recurrent Neural Networks for Gravitational Wave Experiments

Figure 3 for Accelerating Recurrent Neural Networks for Gravitational Wave Experiments

Figure 4 for Accelerating Recurrent Neural Networks for Gravitational Wave Experiments

Abstract:This paper presents novel reconfigurable architectures for reducing the latency of recurrent neural networks (RNNs) that are used for detecting gravitational waves. Gravitational interferometers such as the LIGO detectors capture cosmic events such as black hole mergers which happen at unknown times and of varying durations, producing time-series data. We have developed a new architecture capable of accelerating RNN inference for analyzing time-series data from LIGO detectors. This architecture is based on optimizing the initiation intervals (II) in a multi-layer LSTM (Long Short-Term Memory) network, by identifying appropriate reuse factors for each layer. A customizable template for this architecture has been designed, which enables the generation of low-latency FPGA designs with efficient resource utilization using high-level synthesis tools. The proposed approach has been evaluated based on two LSTM models, targeting a ZYNQ 7045 FPGA and a U250 FPGA. Experimental results show that with balanced II, the number of DSPs can be reduced up to 42% while achieving the same IIs. When compared to other FPGA-based LSTM designs, our design can achieve about 4.92 to 12.4 times lower latency.

* Accepted at the 2021 32nd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Via

Access Paper or Ask Questions

A reconfigurable neural network ASIC for detector front-end data compression at the HL-LHC

May 04, 2021

Giuseppe Di Guglielmo, Farah Fahim, Christian Herwig, Manuel Blanco Valentin, Javier Duarte, Cristian Gingu, Philip Harris, James Hirschauer, Martin Kwok, Vladimir Loncar(+8 more)

Figure 1 for A reconfigurable neural network ASIC for detector front-end data compression at the HL-LHC

Figure 2 for A reconfigurable neural network ASIC for detector front-end data compression at the HL-LHC

Figure 3 for A reconfigurable neural network ASIC for detector front-end data compression at the HL-LHC

Figure 4 for A reconfigurable neural network ASIC for detector front-end data compression at the HL-LHC

Abstract:Despite advances in the programmable logic capabilities of modern trigger systems, a significant bottleneck remains in the amount of data to be transported from the detector to off-detector logic where trigger decisions are made. We demonstrate that a neural network autoencoder model can be implemented in a radiation tolerant ASIC to perform lossy data compression alleviating the data transmission problem while preserving critical information of the detector energy profile. For our application, we consider the high-granularity calorimeter from the CMS experiment at the CERN Large Hadron Collider. The advantage of the machine learning approach is in the flexibility and configurability of the algorithm. By changing the neural network weights, a unique data compression algorithm can be deployed for each sensor in different detector regions, and changing detector or collider conditions. To meet area, performance, and power constraints, we perform a quantization-aware training to create an optimized neural network hardware implementation. The design is achieved through the use of high-level synthesis tools and the hls4ml framework, and was processed through synthesis and physical layout flows based on a LP CMOS 65 nm technology node. The flow anticipates 200 Mrad of ionizing radiation to select gates, and reports a total area of 3.6 mm^2 and consumes 95 mW of power. The simulated energy consumption per inference is 2.4 nJ. This is the first radiation tolerant on-detector ASIC implementation of a neural network that has been designed for particle physics applications.

* 9 pages, 8 figures, 3 tables

Via

Access Paper or Ask Questions

hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

Mar 23, 2021

Farah Fahim, Benjamin Hawks, Christian Herwig, James Hirschauer, Sergo Jindariani, Nhan Tran, Luca P. Carloni, Giuseppe Di Guglielmo, Philip Harris, Jeffrey Krupa(+20 more)

Figure 1 for hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

Figure 2 for hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

Figure 3 for hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

Figure 4 for hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices

Abstract:Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.

* 10 pages, 8 figures, TinyML Research Symposium 2021

Via

Access Paper or Ask Questions