Picture for Mike Lewis

Mike Lewis

Jack

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Add code
Nov 07, 2024
Figure 1 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Figure 2 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Figure 3 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Figure 4 for Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Viaarxiv icon

Law of the Weakest Link: Cross Capabilities of Large Language Models

Add code
Sep 30, 2024
Figure 1 for Law of the Weakest Link: Cross Capabilities of Large Language Models
Figure 2 for Law of the Weakest Link: Cross Capabilities of Large Language Models
Figure 3 for Law of the Weakest Link: Cross Capabilities of Large Language Models
Figure 4 for Law of the Weakest Link: Cross Capabilities of Large Language Models
Viaarxiv icon

The Llama 3 Herd of Models

Add code
Jul 31, 2024
Viaarxiv icon

MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

Add code
Jul 31, 2024
Viaarxiv icon

Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training

Add code
May 06, 2024
Viaarxiv icon

In-Context Pretraining: Language Modeling Beyond Document Boundaries

Add code
Oct 20, 2023
Viaarxiv icon

RA-DIT: Retrieval-Augmented Dual Instruction Tuning

Add code
Oct 08, 2023
Figure 1 for RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Figure 2 for RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Figure 3 for RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Figure 4 for RA-DIT: Retrieval-Augmented Dual Instruction Tuning
Viaarxiv icon

Efficient Streaming Language Models with Attention Sinks

Add code
Sep 29, 2023
Viaarxiv icon

Contrastive Decoding Improves Reasoning in Large Language Models

Add code
Sep 29, 2023
Figure 1 for Contrastive Decoding Improves Reasoning in Large Language Models
Figure 2 for Contrastive Decoding Improves Reasoning in Large Language Models
Figure 3 for Contrastive Decoding Improves Reasoning in Large Language Models
Figure 4 for Contrastive Decoding Improves Reasoning in Large Language Models
Viaarxiv icon

Effective Long-Context Scaling of Foundation Models

Add code
Sep 27, 2023
Viaarxiv icon