Picture for Quentin Anthony

Quentin Anthony

DK

The Zamba2 Suite: Technical Report

Add code
Nov 22, 2024
Viaarxiv icon

RedPajama: an Open Dataset for Training Large Language Models

Add code
Nov 19, 2024
Viaarxiv icon

Zyda-2: a 5 Trillion Token High-Quality Dataset

Add code
Nov 09, 2024
Figure 1 for Zyda-2: a 5 Trillion Token High-Quality Dataset
Figure 2 for Zyda-2: a 5 Trillion Token High-Quality Dataset
Figure 3 for Zyda-2: a 5 Trillion Token High-Quality Dataset
Figure 4 for Zyda-2: a 5 Trillion Token High-Quality Dataset
Viaarxiv icon

Accelerating Large Language Model Training with Hybrid GPU-based Compression

Add code
Sep 04, 2024
Figure 1 for Accelerating Large Language Model Training with Hybrid GPU-based Compression
Figure 2 for Accelerating Large Language Model Training with Hybrid GPU-based Compression
Figure 3 for Accelerating Large Language Model Training with Hybrid GPU-based Compression
Figure 4 for Accelerating Large Language Model Training with Hybrid GPU-based Compression
Viaarxiv icon

Demystifying the Communication Characteristics for Distributed Transformer Models

Add code
Aug 19, 2024
Figure 1 for Demystifying the Communication Characteristics for Distributed Transformer Models
Figure 2 for Demystifying the Communication Characteristics for Distributed Transformer Models
Figure 3 for Demystifying the Communication Characteristics for Distributed Transformer Models
Figure 4 for Demystifying the Communication Characteristics for Distributed Transformer Models
Viaarxiv icon

Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters

Add code
Aug 09, 2024
Figure 1 for Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Figure 2 for Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Figure 3 for Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Figure 4 for Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Viaarxiv icon

Zyda: A 1.3T Dataset for Open Language Modeling

Add code
Jun 04, 2024
Viaarxiv icon

Zamba: A Compact 7B SSM Hybrid Model

Add code
May 26, 2024
Viaarxiv icon

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Add code
Apr 10, 2024
Figure 1 for Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Figure 2 for Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Figure 3 for Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Figure 4 for Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Viaarxiv icon

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Add code
Mar 26, 2024
Figure 1 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Figure 2 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Figure 3 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Figure 4 for Simple and Scalable Strategies to Continually Pre-train Large Language Models
Viaarxiv icon