Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Elham E Khoda

wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation

Nov 06, 2025

Benjamin Hawks, Jason Weitz, Dmitri Demler, Karla Tame-Narvaez, Dennis Plotnikov, Mohammad Mehdi Rahimifar, Hamza Ezzaoui Rahali, Audrey C. Therrien, Donovan Sproule, Elham E Khoda(+6 more)

Figure 1 for wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation

Figure 2 for wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation

Figure 3 for wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation

Figure 4 for wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation

Abstract:As machine learning (ML) is increasingly implemented in hardware to address real-time challenges in scientific applications, the development of advanced toolchains has significantly reduced the time required to iterate on various designs. These advancements have solved major obstacles, but also exposed new challenges. For example, processes that were not previously considered bottlenecks, such as hardware synthesis, are becoming limiting factors in the rapid iteration of designs. To mitigate these emerging constraints, multiple efforts have been undertaken to develop an ML-based surrogate model that estimates resource usage of ML accelerator architectures. We introduce wa-hls4ml, a benchmark for ML accelerator resource and latency estimation, and its corresponding initial dataset of over 680,000 fully connected and convolutional neural networks, all synthesized using hls4ml and targeting Xilinx FPGAs. The benchmark evaluates the performance of resource and latency predictors against several common ML model architectures, primarily originating from scientific domains, as exemplar models, and the average performance across a subset of the dataset. Additionally, we introduce GNN- and transformer-based surrogate models that predict latency and resources for ML accelerators. We present the architecture and performance of the models and find that the models generally predict latency and resources for the 75% percentile within several percent of the synthesized resources on the synthetic test dataset.

* 30 pages, 18 figures

Via

Access Paper or Ask Questions

Interpreting Transformers for Jet Tagging

Dec 04, 2024

Aaron Wang, Abijith Gandrakota, Jennifer Ngadiuba, Vivekanand Sahu, Priyansh Bhatnagar, Elham E Khoda, Javier Duarte

Abstract:Machine learning (ML) algorithms, particularly attention-based transformer models, have become indispensable for analyzing the vast data generated by particle physics experiments like ATLAS and CMS at the CERN LHC. Particle Transformer (ParT), a state-of-the-art model, leverages particle-level attention to improve jet-tagging tasks, which are critical for identifying particles resulting from proton collisions. This study focuses on interpreting ParT by analyzing attention heat maps and particle-pair correlations on the $\eta$-$\phi$ plane, revealing a binary attention pattern where each particle attends to at most one other particle. At the same time, we observe that ParT shows varying focus on important particles and subjets depending on decay, indicating that the model learns traditional jet substructure observables. These insights enhance our understanding of the model's internal workings and learning process, offering potential avenues for improving the efficiency of transformer architectures in future high-energy physics applications.

* Accepted at the Machine Learning and the Physical Sciences Workshop, NeurIPS 2024

Via

Access Paper or Ask Questions

FAIR Universe HiggsML Uncertainty Challenge Competition

Oct 03, 2024

Wahid Bhimji, Paolo Calafiura, Ragansu Chakkappai, Yuan-Tang Chou, Sascha Diefenbacher, Jordan Dudley, Steven Farrell, Aishik Ghosh, Isabelle Guyon, Chris Harris(+12 more)

Figure 1 for FAIR Universe HiggsML Uncertainty Challenge Competition

Figure 2 for FAIR Universe HiggsML Uncertainty Challenge Competition

Figure 3 for FAIR Universe HiggsML Uncertainty Challenge Competition

Figure 4 for FAIR Universe HiggsML Uncertainty Challenge Competition

Abstract:The FAIR Universe -- HiggsML Uncertainty Challenge focuses on measuring the physics properties of elementary particles with imperfect simulators due to differences in modelling systematic errors. Additionally, the challenge is leveraging a large-compute-scale AI platform for sharing datasets, training models, and hosting machine learning competitions. Our challenge brings together the physics and machine learning communities to advance our understanding and methodologies in handling systematic (epistemic) uncertainties within AI techniques.

* Whitepaper for the FAIR Universe HiggsML Uncertainty Challenge Competition, available : https://fair-universe.lbl.gov

Via

Access Paper or Ask Questions

Low Latency Transformer Inference on FPGAs for Physics Applications with hls4ml

Sep 08, 2024

Zhixing Jiang, Dennis Yin, Yihui Chen, Elham E Khoda, Scott Hauck, Shih-Chieh Hsu, Ekaterina Govorkova, Philip Harris, Vladimir Loncar, Eric A. Moreno

Figure 1 for Low Latency Transformer Inference on FPGAs for Physics Applications with hls4ml

Figure 2 for Low Latency Transformer Inference on FPGAs for Physics Applications with hls4ml

Figure 3 for Low Latency Transformer Inference on FPGAs for Physics Applications with hls4ml

Figure 4 for Low Latency Transformer Inference on FPGAs for Physics Applications with hls4ml

Abstract:This study presents an efficient implementation of transformer architectures in Field-Programmable Gate Arrays(FPGAs) using hls4ml. We demonstrate the strategy for implementing the multi-head attention, softmax, and normalization layer and evaluate three distinct models. Their deployment on VU13P FPGA chip achieved latency less than 2us, demonstrating the potential for real-time applications. HLS4ML compatibility with any TensorFlow-built transformer model further enhances the scalability and applicability of this work. Index Terms: FPGAs, machine learning, transformers, high energy physics, LIGO

Via

Access Paper or Ask Questions

FPGA Deployment of LFADS for Real-time Neuroscience Experiments

Feb 02, 2024

Xiaohan Liu, ChiJui Chen, YanLun Huang, LingChi Yang, Elham E Khoda, Yihui Chen, Scott Hauck, Shih-Chieh Hsu, Bo-Cheng Lai

Figure 1 for FPGA Deployment of LFADS for Real-time Neuroscience Experiments

Figure 2 for FPGA Deployment of LFADS for Real-time Neuroscience Experiments

Figure 3 for FPGA Deployment of LFADS for Real-time Neuroscience Experiments

Figure 4 for FPGA Deployment of LFADS for Real-time Neuroscience Experiments

Abstract:Large-scale recordings of neural activity are providing new opportunities to study neural population dynamics. A powerful method for analyzing such high-dimensional measurements is to deploy an algorithm to learn the low-dimensional latent dynamics. LFADS (Latent Factor Analysis via Dynamical Systems) is a deep learning method for inferring latent dynamics from high-dimensional neural spiking data recorded simultaneously in single trials. This method has shown a remarkable performance in modeling complex brain signals with an average inference latency in milliseconds. As our capacity of simultaneously recording many neurons is increasing exponentially, it is becoming crucial to build capacity for deploying low-latency inference of the computing algorithms. To improve the real-time processing ability of LFADS, we introduce an efficient implementation of the LFADS models onto Field Programmable Gate Arrays (FPGA). Our implementation shows an inference latency of 41.97 $\mu$s for processing the data in a single trial on a Xilinx U55C.

* Fast Machine Learning for Science, ICCAD 2023
* 6 pages, 8 figures

Via

Access Paper or Ask Questions

Ultra Fast Transformers on FPGAs for Particle Physics Experiments

Feb 01, 2024

Zhixing Jiang, Dennis Yin, Elham E Khoda, Vladimir Loncar, Ekaterina Govorkova, Eric Moreno, Philip Harris, Scott Hauck, Shih-Chieh Hsu

Figure 1 for Ultra Fast Transformers on FPGAs for Particle Physics Experiments

Figure 2 for Ultra Fast Transformers on FPGAs for Particle Physics Experiments

Abstract:This work introduces a highly efficient implementation of the transformer architecture on a Field-Programmable Gate Array (FPGA) by using the \texttt{hls4ml} tool. Given the demonstrated effectiveness of transformer models in addressing a wide range of problems, their application in experimental triggers within particle physics becomes a subject of significant interest. In this work, we have implemented critical components of a transformer model, such as multi-head attention and softmax layers. To evaluate the effectiveness of our implementation, we have focused on a particle physics jet flavor tagging problem, employing a public dataset. We recorded latency under 2 $\mu$s on the Xilinx UltraScale+ FPGA, which is compatible with hardware trigger requirements at the CERN Large Hadron Collider experiments.

* Machine Learning and the Physical Sciences Workshop, NeurIPS 2023
* 6 pages, 2 figures

Via

Access Paper or Ask Questions

Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Jul 01, 2022

Elham E Khoda, Dylan Rankin, Rafael Teixeira de Lima, Philip Harris, Scott Hauck, Shih-Chieh Hsu, Michael Kagan, Vladimir Loncar, Chaitanya Paikara, Richa Rao(+3 more)

Figure 1 for Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Figure 2 for Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Figure 3 for Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Figure 4 for Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Abstract:Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neural network layers -- long short-term memory and gated recurrent unit -- within the hls4ml framework. We demonstrate that our implementation is capable of producing effective designs for both small and large models, and can be customized to meet specific design requirements for inference latencies and FPGA resources. We show the performance and synthesized designs for multiple neural networks, many of which are trained specifically for jet identification tasks at the CERN Large Hadron Collider.

* 12 pages, 6 figures, 5 tables

Via

Access Paper or Ask Questions

Physics Community Needs, Tools, and Resources for Machine Learning

Mar 30, 2022

Philip Harris, Erik Katsavounidis, William Patrick McCormack, Dylan Rankin, Yongbin Feng, Abhijith Gandrakota, Christian Herwig, Burt Holzman, Kevin Pedro, Nhan Tran(+11 more)

Figure 1 for Physics Community Needs, Tools, and Resources for Machine Learning

Figure 2 for Physics Community Needs, Tools, and Resources for Machine Learning

Figure 3 for Physics Community Needs, Tools, and Resources for Machine Learning

Figure 4 for Physics Community Needs, Tools, and Resources for Machine Learning

Abstract:Machine learning (ML) is becoming an increasingly important component of cutting-edge physics research, but its computational requirements present significant challenges. In this white paper, we discuss the needs of the physics community regarding ML across latency and throughput regimes, the tools and resources that offer the possibility of addressing these needs, and how these can be best utilized and accessed in the coming years.

* Contribution to Snowmass 2021, 33 pages, 5 figures

Via

Access Paper or Ask Questions

Applications and Techniques for Fast Machine Learning in Science

Oct 25, 2021

Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer(+77 more)

Figure 1 for Applications and Techniques for Fast Machine Learning in Science

Figure 2 for Applications and Techniques for Fast Machine Learning in Science

Figure 3 for Applications and Techniques for Fast Machine Learning in Science

Figure 4 for Applications and Techniques for Fast Machine Learning in Science

Abstract:In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.

* 66 pages, 13 figures, 5 tables

Via

Access Paper or Ask Questions