Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Luke de Oliveira

Jack

The Llama 3 Herd of Models

Jul 31, 2024

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan(+521 more)

Abstract:Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

Via

Access Paper or Ask Questions

Emergent Properties of Finetuned Language Representation Models

Oct 23, 2019

Alexandre Matton, Luke de Oliveira

Figure 1 for Emergent Properties of Finetuned Language Representation Models

Figure 2 for Emergent Properties of Finetuned Language Representation Models

Figure 3 for Emergent Properties of Finetuned Language Representation Models

Figure 4 for Emergent Properties of Finetuned Language Representation Models

Abstract:Large, self-supervised transformer-based language representation models have recently received significant amounts of attention, and have produced state-of-the-art results across a variety of tasks simply by scaling up pre-training on larger and larger corpora. Such models usually produce high dimensional vectors, on top of which additional task-specific layers and architectural modifications are added to adapt them to specific downstream tasks. Though there exists ample evidence that such models work well, we aim to understand what happens when they work well. We analyze the redundancy and location of information contained in output vectors for one such language representation model -- BERT. We show empirical evidence that the [CLS] embedding in BERT contains highly redundant information, and can be compressed with minimal loss of accuracy, especially for finetuned models, dovetailing into open threads in the field about the role of over-parameterization in learning. We also shed light on the existence of specific output dimensions which alone give very competitive results when compared to using all dimensions of output vectors.

* 7 pages

Via

Access Paper or Ask Questions

Repurposing Decoder-Transformer Language Models for Abstractive Summarization

Sep 01, 2019

Luke de Oliveira, Alfredo Láinez Rodrigo

Figure 1 for Repurposing Decoder-Transformer Language Models for Abstractive Summarization

Figure 2 for Repurposing Decoder-Transformer Language Models for Abstractive Summarization

Figure 3 for Repurposing Decoder-Transformer Language Models for Abstractive Summarization

Figure 4 for Repurposing Decoder-Transformer Language Models for Abstractive Summarization

Abstract:Neural network models have shown excellent fluency and performance when applied to abstractive summarization. Many approaches to neural abstractive summarization involve the introduction of significant inductive bias, exemplified through the use of components such as pointer-generator architectures, coverage, and partially extractive procedures, designed to mimic the process by which humans summarize documents. We show that it is possible to attain competitive performance by instead directly viewing summarization as a language modeling problem and effectively leveraging transfer learning. We introduce a simple procedure built upon decoder-transformers to obtain highly competitive ROUGE scores for summarization performance using a language modeling loss alone, with no beam-search or other decoding-time optimization, and instead relying on efficient nucleus sampling and greedy decoding.

Via

Access Paper or Ask Questions

CaloGAN: Simulating 3D High Energy Particle Showers in Multi-Layer Electromagnetic Calorimeters with Generative Adversarial Networks

Dec 21, 2017

Michela Paganini, Luke de Oliveira, Benjamin Nachman

Figure 1 for CaloGAN: Simulating 3D High Energy Particle Showers in Multi-Layer Electromagnetic Calorimeters with Generative Adversarial Networks

Figure 2 for CaloGAN: Simulating 3D High Energy Particle Showers in Multi-Layer Electromagnetic Calorimeters with Generative Adversarial Networks

Figure 3 for CaloGAN: Simulating 3D High Energy Particle Showers in Multi-Layer Electromagnetic Calorimeters with Generative Adversarial Networks

Figure 4 for CaloGAN: Simulating 3D High Energy Particle Showers in Multi-Layer Electromagnetic Calorimeters with Generative Adversarial Networks

Abstract:The precise modeling of subatomic particle interactions and propagation through matter is paramount for the advancement of nuclear and particle physics searches and precision measurements. The most computationally expensive step in the simulation pipeline of a typical experiment at the Large Hadron Collider (LHC) is the detailed modeling of the full complexity of physics processes that govern the motion and evolution of particle showers inside calorimeters. We introduce \textsc{CaloGAN}, a new fast simulation technique based on generative adversarial networks (GANs). We apply these neural networks to the modeling of electromagnetic showers in a longitudinally segmented calorimeter, and achieve speedup factors comparable to or better than existing full simulation techniques on CPU ($100\times$-$1000\times$) and even faster on GPU (up to $\sim10^5\times$). There are still challenges for achieving precision across the entire phase space, but our solution can reproduce a variety of geometric shower shape properties of photons, positrons and charged pions. This represents a significant stepping stone toward a full neural network-based detector simulation that could save significant computing time and enable many analyses now and in the future.

* Phys. Rev. D 97, 014021 (2018)
* 14 pages, 4 tables, 13 figures; version accepted by Physical Review D (PRD)

Via

Access Paper or Ask Questions

Accelerating Science with Generative Adversarial Networks: An Application to 3D Particle Showers in Multi-Layer Calorimeters

Dec 21, 2017

Michela Paganini, Luke de Oliveira, Benjamin Nachman

Figure 1 for Accelerating Science with Generative Adversarial Networks: An Application to 3D Particle Showers in Multi-Layer Calorimeters

Figure 2 for Accelerating Science with Generative Adversarial Networks: An Application to 3D Particle Showers in Multi-Layer Calorimeters

Figure 3 for Accelerating Science with Generative Adversarial Networks: An Application to 3D Particle Showers in Multi-Layer Calorimeters

Abstract:Physicists at the Large Hadron Collider (LHC) rely on detailed simulations of particle collisions to build expectations of what experimental data may look like under different theory modeling assumptions. Petabytes of simulated data are needed to develop analysis techniques, though they are expensive to generate using existing algorithms and computing resources. The modeling of detectors and the precise description of particle cascades as they interact with the material in the calorimeter are the most computationally demanding steps in the simulation pipeline. We therefore introduce a deep neural network-based generative model to enable high-fidelity, fast, electromagnetic calorimeter simulation. There are still challenges for achieving precision across the entire phase space, but our current solution can reproduce a variety of particle shower properties while achieving speed-up factors of up to 100,000$\times$. This opens the door to a new era of fast simulation that could save significant computing time and disk space, while extending the reach of physics searches and precision measurements at the LHC and beyond.

* Phys. Rev. Lett. 120, 042003 (2018)
* 6 pages, 3 figures; version accepted by Physical Review Letters (PRL)

Via

Access Paper or Ask Questions

Controlling Physical Attributes in GAN-Accelerated Simulation of Electromagnetic Calorimeters

Nov 23, 2017

Luke de Oliveira, Michela Paganini, Benjamin Nachman

Figure 1 for Controlling Physical Attributes in GAN-Accelerated Simulation of Electromagnetic Calorimeters

Figure 2 for Controlling Physical Attributes in GAN-Accelerated Simulation of Electromagnetic Calorimeters

Figure 3 for Controlling Physical Attributes in GAN-Accelerated Simulation of Electromagnetic Calorimeters

Figure 4 for Controlling Physical Attributes in GAN-Accelerated Simulation of Electromagnetic Calorimeters

Abstract:High-precision modeling of subatomic particle interactions is critical for many fields within the physical sciences, such as nuclear physics and high energy particle physics. Most simulation pipelines in the sciences are computationally intensive -- in a variety of scientific fields, Generative Adversarial Networks have been suggested as a solution to speed up the forward component of simulation, with promising results. An important component of any simulation system for the sciences is the ability to condition on any number of physically meaningful latent characteristics that can effect the forward generation procedure. We introduce an auxiliary task to the training of a Generative Adversarial Network on particle showers in a multi-layer electromagnetic calorimeter, which allows our model to learn an attribute-aware conditioning mechanism.

* 7 pages, 5 figures, in proceedings of the 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2017)

Via

Access Paper or Ask Questions

Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis

Jun 13, 2017

Luke de Oliveira, Michela Paganini, Benjamin Nachman

Figure 1 for Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis

Figure 2 for Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis

Figure 3 for Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis

Figure 4 for Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics Synthesis

Abstract:We provide a bridge between generative modeling in the Machine Learning community and simulated physical processes in High Energy Particle Physics by applying a novel Generative Adversarial Network (GAN) architecture to the production of jet images -- 2D representations of energy depositions from particles interacting with a calorimeter. We propose a simple architecture, the Location-Aware Generative Adversarial Network, that learns to produce realistic radiation patterns from simulated high energy particle collisions. The pixel intensities of GAN-generated images faithfully span over many orders of magnitude and exhibit the desired low-dimensional physical properties (i.e., jet mass, n-subjettiness, etc.). We shed light on limitations, and provide a novel empirical validation of image quality and validity of GAN-produced simulations of the natural world. This work provides a base for further explorations of GANs for use in faster simulation in High Energy Particle Physics.

* Comput Softw Big Sci (2017) 1: 4
* 23 pages, 23 figures, 1 table, and appendix; Added new validation metric, acknowledgements, minor corrections

Via

Access Paper or Ask Questions

Jet-Images -- Deep Learning Edition

Jan 22, 2017

Luke de Oliveira, Michael Kagan, Lester Mackey, Benjamin Nachman, Ariel Schwartzman

Figure 1 for Jet-Images -- Deep Learning Edition

Abstract:Building on the notion of a particle physics detector as a camera and the collimated streams of high energy particles, or jets, it measures as an image, we investigate the potential of machine learning techniques based on deep learning architectures to identify highly boosted W bosons. Modern deep learning algorithms trained on jet images can out-perform standard physically-motivated feature driven approaches to jet tagging. We develop techniques for visualizing how these features are learned by the network and what additional information is used to improve performance. This interplay between physically-motivated feature driven tools and supervised learning algorithms is general and can be used to significantly increase the sensitivity to discover new particles and new forces, and gain a deeper understanding of the physics within jets.

* JHEP 07 (2016) 069
* 32 pages, 24 figures. Version that is published in JHEP

Via

Access Paper or Ask Questions