Abstract:Experimental particle physics demands a sophisticated trigger and acquisition system capable to efficiently retain the collisions of interest for further investigation. Heterogeneous computing with the employment of FPGA cards may emerge as a trending technology for the triggering strategy of the upcoming high-luminosity program of the Large Hadron Collider at CERN. In this context, we present two machine-learning algorithms for selecting events where neutral long-lived particles decay within the detector volume studying their accuracy and inference time when accelerated on commercially available Xilinx FPGA accelerator cards. The inference time is also confronted with a CPU- and GPU-based hardware setup. The proposed new algorithms are proven efficient for the considered benchmark physics scenario and their accuracy is found to not degrade when accelerated on the FPGA cards. The results indicate that all tested architectures fit within the latency requirements of a second-level trigger farm and that exploiting accelerator technologies for real-time processing of particle-physics collisions is a promising research field that deserves additional investigations, in particular with machine-learning models with a large number of trainable parameters.
Abstract:Normalizing Flows have emerged as a powerful brand of generative models, as they not only allow for efficient sampling of complicated target distributions, but also deliver density estimation by construction. We propose here an in-depth comparison of coupling and autoregressive flows, both of the affine and rational quadratic spline type, considering four different architectures: Real-valued Non-Volume Preserving (RealNVP), Masked Autoregressive Flow (MAF), Coupling Rational Quadratic Spline (C-RQS), and Autoregressive Rational Quadratic Spline (A-RQS). We focus on different target distributions of increasing complexity with dimensionality ranging from 4 to 1000. The performances are discussed in terms of different figures of merit: the one-dimensional Wasserstein distance, the one-dimensional Kolmogorov-Smirnov test, the Frobenius norm of the difference between correlation matrices, and the training time. Our results indicate that the A-RQS algorithm stands out both in terms of accuracy and training speed. Nonetheless, all the algorithms are generally able, without much fine-tuning, to learn complex distributions with limited training data and in a reasonable time, of the order of hours on a Tesla V100 GPU. The only exception is the C-RQS, which takes significantly longer to train, and does not always provide good accuracy. All algorithms have been implemented using TensorFlow2 and TensorFlow Probability and made available on GitHub.