Picture for Mike Lewis

Mike Lewis

Jack

Byte Latent Transformer: Patches Scale Better Than Tokens

Add code
Dec 13, 2024
Viaarxiv icon

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Add code
Nov 07, 2024
Figure 1 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Figure 2 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Figure 3 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Figure 4 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Viaarxiv icon

Law of the Weakest Link: Cross Capabilities of Large Language Models

Add code
Sep 30, 2024
Figure 1 for Law of the Weakest Link: Cross Capabilities of Large Language Models
Figure 2 for Law of the Weakest Link: Cross Capabilities of Large Language Models
Figure 3 for Law of the Weakest Link: Cross Capabilities of Large Language Models
Figure 4 for Law of the Weakest Link: Cross Capabilities of Large Language Models
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon

MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Add code
Jul 31, 2024
Figure 1 for MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Figure 2 for MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Figure 3 for MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Figure 4 for MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
Viaarxiv icon

Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training

Add code
May 06, 2024
Viaarxiv icon

In-Context Pretraining: Language Modeling Beyond Document Boundaries

Add code
Oct 20, 2023
Viaarxiv icon

RA-DIT: Retrieval-Augmented Dual Instruction Tuning

Add code
Oct 08, 2023
Figure 1 for RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Figure 2 for RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Figure 3 for RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Figure 4 for RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Viaarxiv icon

Contrastive Decoding Improves Reasoning in Large Language Models

Add code
Sep 29, 2023
Figure 1 for Contrastive Decoding Improves Reasoning in Large Language Models
Figure 2 for Contrastive Decoding Improves Reasoning in Large Language Models
Figure 3 for Contrastive Decoding Improves Reasoning in Large Language Models
Figure 4 for Contrastive Decoding Improves Reasoning in Large Language Models
Viaarxiv icon

Efficient Streaming Language Models with Attention Sinks

Add code
Sep 29, 2023
Viaarxiv icon