Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Kagan

SLAC National Accelerator Laboratory, Menlo Park, CA, USA

Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models

Mar 11, 2024

Philip Harris, Michael Kagan, Jeffrey Krupa, Benedikt Maier, Nathaniel Woodward

Figure 1 for Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models

Figure 2 for Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models

Figure 3 for Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models

Figure 4 for Re-Simulation-based Self-Supervised Learning for Pre-Training Foundation Models

Abstract:Self-Supervised Learning (SSL) is at the core of training modern large machine learning models, providing a scheme for learning powerful representations that can be used in a variety of downstream tasks. However, SSL strategies must be adapted to the type of training data and downstream tasks required. We propose RS3L, a novel simulation-based SSL strategy that employs a method of re-simulation to drive data augmentation for contrastive learning. By intervening in the middle of the simulation process and re-running simulation components downstream of the intervention, we generate multiple realizations of an event, thus producing a set of augmentations covering all physics-driven variations available in the simulator. Using experiments from high-energy physics, we explore how this strategy may enable the development of a foundation model; we show how R3SL pre-training enables powerful performance in downstream tasks such as discrimination of a variety of objects and uncertainty mitigation. In addition to our results, we make the RS3L dataset publicly available for further studies on how to improve SSL strategies.

* 24 pages, 9 figures

Via

Access Paper or Ask Questions

Masked Particle Modeling on Sets: Towards Self-Supervised High Energy Physics Foundation Models

Jan 25, 2024

Lukas Heinrich, Tobias Golling, Michael Kagan, Samuel Klein, Matthew Leigh, Margarita Osadchy, John Andrew Raine

Figure 1 for Masked Particle Modeling on Sets: Towards Self-Supervised High Energy Physics Foundation Models

Figure 2 for Masked Particle Modeling on Sets: Towards Self-Supervised High Energy Physics Foundation Models

Figure 3 for Masked Particle Modeling on Sets: Towards Self-Supervised High Energy Physics Foundation Models

Figure 4 for Masked Particle Modeling on Sets: Towards Self-Supervised High Energy Physics Foundation Models

Abstract:We propose masked particle modeling (MPM) as a self-supervised method for learning generic, transferable, and reusable representations on unordered sets of inputs for use in high energy physics (HEP) scientific data. This work provides a novel scheme to perform masked modeling based pre-training to learn permutation invariant functions on sets. More generally, this work provides a step towards building large foundation models for HEP that can be generically pre-trained with self-supervised learning and later fine-tuned for a variety of down-stream tasks. In MPM, particles in a set are masked and the training objective is to recover their identity, as defined by a discretized token representation of a pre-trained vector quantized variational autoencoder. We study the efficacy of the method in samples of high energy jets at collider physics experiments, including studies on the impact of discretization, permutation invariance, and ordering. We also study the fine-tuning capability of the model, showing that it can be adapted to tasks such as supervised and weakly supervised jet classification, and that the model can transfer efficiently with small fine-tuning data sets to new classes and new data domains.

Via

Access Paper or Ask Questions

Differentiable Vertex Fitting for Jet Flavour Tagging

Oct 19, 2023

Rachel E. C. Smith, Inês Ochoa, Rúben Inácio, Jonathan Shoemaker, Michael Kagan

Abstract:We propose a differentiable vertex fitting algorithm that can be used for secondary vertex fitting, and that can be seamlessly integrated into neural networks for jet flavour tagging. Vertex fitting is formulated as an optimization problem where gradients of the optimized solution vertex are defined through implicit differentiation and can be passed to upstream or downstream neural network components for network training. More broadly, this is an application of differentiable programming to integrate physics knowledge into neural network models in high energy physics. We demonstrate how differentiable secondary vertex fitting can be integrated into larger transformer-based models for flavour tagging and improve heavy flavour jet classification.

* 11 pages

Via

Access Paper or Ask Questions

Branches of a Tree: Taking Derivatives of Programs with Discrete and Branching Randomness in High Energy Physics

Aug 31, 2023

Michael Kagan, Lukas Heinrich

Figure 1 for Branches of a Tree: Taking Derivatives of Programs with Discrete and Branching Randomness in High Energy Physics

Figure 2 for Branches of a Tree: Taking Derivatives of Programs with Discrete and Branching Randomness in High Energy Physics

Figure 3 for Branches of a Tree: Taking Derivatives of Programs with Discrete and Branching Randomness in High Energy Physics

Figure 4 for Branches of a Tree: Taking Derivatives of Programs with Discrete and Branching Randomness in High Energy Physics

Abstract:We propose to apply several gradient estimation techniques to enable the differentiation of programs with discrete randomness in High Energy Physics. Such programs are common in High Energy Physics due to the presence of branching processes and clustering-based analysis. Thus differentiating such programs can open the way for gradient based optimization in the context of detector design optimization, simulator tuning, or data analysis and reconstruction optimization. We discuss several possible gradient estimation strategies, including the recent Stochastic AD method, and compare them in simplified detector design experiments. In doing so we develop, to the best of our knowledge, the first fully differentiable branching program.

* 8 pages

Via

Access Paper or Ask Questions

Interpretable Uncertainty Quantification in AI for HEP

Aug 08, 2022

Thomas Y. Chen, Biprateep Dey, Aishik Ghosh, Michael Kagan, Brian Nord, Nesar Ramachandra

Abstract:Estimating uncertainty is at the core of performing scientific measurements in HEP: a measurement is not useful without an estimate of its uncertainty. The goal of uncertainty quantification (UQ) is inextricably linked to the question, "how do we physically and statistically interpret these uncertainties?" The answer to this question depends not only on the computational task we aim to undertake, but also on the methods we use for that task. For artificial intelligence (AI) applications in HEP, there are several areas where interpretable methods for UQ are essential, including inference, simulation, and control/decision-making. There exist some methods for each of these areas, but they have not yet been demonstrated to be as trustworthy as more traditional approaches currently employed in physics (e.g., non-AI frequentist and Bayesian methods). Shedding light on the questions above requires additional understanding of the interplay of AI systems and uncertainty quantification. We briefly discuss the existing methods in each area and relate them to tasks across HEP. We then discuss recommendations for avenues to pursue to develop the necessary techniques for reliable widespread usage of AI with UQ over the next decade.

* Submitted to the Proceedings of the US Community Study on the Future of Particle Physics (Snowmass 2021)

Via

Access Paper or Ask Questions

Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Jul 01, 2022

Elham E Khoda, Dylan Rankin, Rafael Teixeira de Lima, Philip Harris, Scott Hauck, Shih-Chieh Hsu, Michael Kagan, Vladimir Loncar, Chaitanya Paikara, Richa Rao(+3 more)

Figure 1 for Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Figure 2 for Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Figure 3 for Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Figure 4 for Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Abstract:Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neural network layers -- long short-term memory and gated recurrent unit -- within the hls4ml framework. We demonstrate that our implementation is capable of producing effective designs for both small and large models, and can be customized to meet specific design requirements for inference latencies and FPGA resources. We show the performance and synthesized designs for multiple neural networks, many of which are trained specifically for jet identification tasks at the CERN Large Hadron Collider.

* 12 pages, 6 figures, 5 tables

Via

Access Paper or Ask Questions

Novel Light Field Imaging Device with Enhanced Light Collection for Cold Atom Clouds

May 23, 2022

Sanha Cheong, Josef C. Frisch, Sean Gasiorowski, Jason M. Hogan, Michael Kagan, Murtaza Safdari, Ariel Schwartzman, Maxime Vandegar

Figure 1 for Novel Light Field Imaging Device with Enhanced Light Collection for Cold Atom Clouds

Figure 2 for Novel Light Field Imaging Device with Enhanced Light Collection for Cold Atom Clouds

Figure 3 for Novel Light Field Imaging Device with Enhanced Light Collection for Cold Atom Clouds

Figure 4 for Novel Light Field Imaging Device with Enhanced Light Collection for Cold Atom Clouds

Abstract:We present a light field imaging system that captures multiple views of an object with a single shot. The system is designed to maximize the total light collection by accepting a larger solid angle of light than a conventional lens with equivalent depth of field. This is achieved by populating a plane of virtual objects using mirrors and fully utilizing the available field of view and depth of field. Simulation results demonstrate that this design is capable of single-shot tomography of objects of size $\mathcal{O}$(1 mm$^3$), reconstructing the 3-dimensional (3D) distribution and features not accessible from any single view angle in isolation. In particular, for atom clouds used in atom interferometry experiments, the system can reconstruct 3D fringe patterns with size $\mathcal{O}$(100 $\mu$m). We also demonstrate this system with a 3D-printed prototype. The prototype is used to take images of $\mathcal{O}$(1 mm$^{3}$) sized objects, and 3D reconstruction algorithms running on a single-shot image successfully reconstruct $\mathcal{O}$(100 $\mu$m) internal features. The prototype also shows that the system can be built with 3D printing technology and hence can be deployed quickly and cost-effectively in experiments with needs for enhanced light collection or 3D reconstruction. Imaging of cold atom clouds in atom interferometry is a key application of this new type of imaging device where enhanced light collection, high depth of field, and 3D tomographic reconstruction can provide new handles to characterize the atom clouds.

Via

Access Paper or Ask Questions

Graph Neural Networks in Particle Physics: Implementations, Innovations, and Challenges

Mar 25, 2022

Savannah Thais, Paolo Calafiura, Grigorios Chachamis, Gage DeZoort, Javier Duarte, Sanmay Ganguly, Michael Kagan, Daniel Murnane, Mark S. Neubauer, Kazuhiro Terao

Figure 1 for Graph Neural Networks in Particle Physics: Implementations, Innovations, and Challenges

Figure 2 for Graph Neural Networks in Particle Physics: Implementations, Innovations, and Challenges

Figure 3 for Graph Neural Networks in Particle Physics: Implementations, Innovations, and Challenges

Figure 4 for Graph Neural Networks in Particle Physics: Implementations, Innovations, and Challenges

Abstract:Many physical systems can be best understood as sets of discrete data with associated relationships. Where previously these sets of data have been formulated as series or image data to match the available machine learning architectures, with the advent of graph neural networks (GNNs), these systems can be learned natively as graphs. This allows a wide variety of high- and low-level physical features to be attached to measurements and, by the same token, a wide variety of HEP tasks to be accomplished by the same GNN architectures. GNNs have found powerful use-cases in reconstruction, tagging, generation and end-to-end analysis. With the wide-spread adoption of GNNs in industry, the HEP community is well-placed to benefit from rapid improvements in GNN latency and memory usage. However, industry use-cases are not perfectly aligned with HEP and much work needs to be done to best match unique GNN capabilities to unique HEP obstacles. We present here a range of these capabilities, predictions of which are currently being well-adopted in HEP communities, and which are still immature. We hope to capture the landscape of graph techniques in machine learning as well as point out the most significant gaps that are inhibiting potentially large leaps in research.

* contribution to Snowmass 2021

Via

Access Paper or Ask Questions

New directions for surrogate models and differentiable programming for High Energy Physics detector simulation

Mar 15, 2022

Andreas Adelmann, Walter Hopkins, Evangelos Kourlitis, Michael Kagan, Gregor Kasieczka, Claudius Krause, David Shih, Vinicius Mikuni, Benjamin Nachman, Kevin Pedro(+1 more)

Figure 1 for New directions for surrogate models and differentiable programming for High Energy Physics detector simulation

Figure 2 for New directions for surrogate models and differentiable programming for High Energy Physics detector simulation

Figure 3 for New directions for surrogate models and differentiable programming for High Energy Physics detector simulation

Figure 4 for New directions for surrogate models and differentiable programming for High Energy Physics detector simulation

Abstract:The computational cost for high energy physics detector simulation in future experimental facilities is going to exceed the current available resources. To overcome this challenge, new ideas on surrogate models using machine learning methods are being explored to replace computationally expensive components. Additionally, differentiable programming has been proposed as a complementary approach, providing controllable and scalable simulation routines. In this document, new and ongoing efforts for surrogate models and differential programming applied to detector simulation are discussed in the context of the 2021 Particle Physics Community Planning Exercise (`Snowmass').

* contribution to Snowmass 2021

Via

Access Paper or Ask Questions

Differentiable Matrix Elements with MadJax

Feb 28, 2022

Lukas Heinrich, Michael Kagan

Figure 1 for Differentiable Matrix Elements with MadJax

Figure 2 for Differentiable Matrix Elements with MadJax

Figure 3 for Differentiable Matrix Elements with MadJax

Figure 4 for Differentiable Matrix Elements with MadJax

Abstract:MadJax is a tool for generating and evaluating differentiable matrix elements of high energy scattering processes. As such, it is a step towards a differentiable programming paradigm in high energy physics that facilitates the incorporation of high energy physics domain knowledge, encoded in simulation software, into gradient based learning and optimization pipelines. MadJax comprises two components: (a) a plugin to the general purpose matrix element generator MadGraph that integrates matrix element and phase space sampling code with the JAX differentiable programming framework, and (b) a standalone wrapping API for accessing the matrix element code and its gradients, which are computed with automatic differentiation. The MadJax implementation and example applications of simulation based inference and normalizing flow based matrix element modeling, with capabilities enabled uniquely with differentiable matrix elements, are presented.

* 6 pages, Proceedings of the 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2021)

Via

Access Paper or Ask Questions