Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Keqian Li

Sid

The Llama 3 Herd of Models

Jul 31, 2024

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan(+521 more)

Abstract:Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

Via

Access Paper or Ask Questions

MetaCon: Unified Predictive Segments System with Trillion Concept Meta-Learning

Mar 09, 2022

Keqian Li, Yifan Hu, Logan Palanisamy, Lisa Jones, Akshay Gupta, Jason Grigsby, Ili Selinger, Matt Gillingham, Fei Tan

Figure 1 for MetaCon: Unified Predictive Segments System with Trillion Concept Meta-Learning

Figure 2 for MetaCon: Unified Predictive Segments System with Trillion Concept Meta-Learning

Figure 3 for MetaCon: Unified Predictive Segments System with Trillion Concept Meta-Learning

Figure 4 for MetaCon: Unified Predictive Segments System with Trillion Concept Meta-Learning

Abstract:Accurate understanding of users in terms of predicative segments play an essential role in the day to day operation of modern internet enterprises. Nevertheless, there are significant challenges that limit the quality of data, especially on long tail predictive tasks. In this work, we present MetaCon, our unified predicative segments system with scalable, trillion concepts meta learning that addresses these challenges. It builds on top of a flat concept representation that summarizes entities' heterogeneous digital footprint, jointly considers the entire spectrum of predicative tasks as a single learning task, and leverages principled meta learning approach with efficient first order meta-optimization procedure under a provable performance guarantee in order to solve the learning task. Experiments on both proprietary production datasets and public structured learning tasks demonstrate that MetaCon can lead to substantial improvements over state of the art recommendation and ranking approaches.

Via

Access Paper or Ask Questions

SuperCone: Modeling Heterogeneous Experts with Concept Meta-learning for Unified Predictive Segments System

Mar 09, 2022

Keqian Li, Yifan Hu

Figure 1 for SuperCone: Modeling Heterogeneous Experts with Concept Meta-learning for Unified Predictive Segments System

Figure 2 for SuperCone: Modeling Heterogeneous Experts with Concept Meta-learning for Unified Predictive Segments System

Figure 3 for SuperCone: Modeling Heterogeneous Experts with Concept Meta-learning for Unified Predictive Segments System

Figure 4 for SuperCone: Modeling Heterogeneous Experts with Concept Meta-learning for Unified Predictive Segments System

Abstract:Understanding users through predicative segments play an essential role for modern enterprises for more efficient and efficient information exchange. For example, by predicting whether a user has particular interest in a particular area of sports or entertainment, we can better serve the user with more relevant and tailored content. However, there exists a large number of long tail prediction tasks that are hard to capture by off the shelf model architectures due to data scarcity and task heterogeneity. In this work, we present SuperCone, our unified predicative segments system that addresses the above challenges. It builds on top of a flat concept representation that summarizes each user's heterogeneous digital footprints, and uniformly models each of the prediction task using an approach called "super learning ", that is, combining prediction models with diverse architectures or learning method that are not compatible with each other or even completely unknown. Following this, we provide end to end deep learning architecture design that flexibly learns to attend to best suited heterogeneous experts while at the same time learns deep representations of the input concepts that augments the above experts by capturing unique signal. Experiments show that SuperCone can outperform state-of-the-art recommendation and ranking algorithms on a wide range of predicative segment tasks, as well as several public structured data learning benchmarks.

Via

Access Paper or Ask Questions