Picture for Kushal Tirumala

Kushal Tirumala

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Add code
Aug 20, 2024
Viaarxiv icon

Brevity is the soul of wit: Pruning long files for code generation

Add code
Jun 29, 2024
Figure 1 for Brevity is the soul of wit: Pruning long files for code generation
Figure 2 for Brevity is the soul of wit: Pruning long files for code generation
Figure 3 for Brevity is the soul of wit: Pruning long files for code generation
Figure 4 for Brevity is the soul of wit: Pruning long files for code generation
Viaarxiv icon

An Introduction to Vision-Language Modeling

Add code
May 27, 2024
Figure 1 for An Introduction to Vision-Language Modeling
Figure 2 for An Introduction to Vision-Language Modeling
Figure 3 for An Introduction to Vision-Language Modeling
Viaarxiv icon

Text Quality-Based Pruning for Efficient Training of Language Models

Add code
Apr 26, 2024
Viaarxiv icon

The Unreasonable Ineffectiveness of the Deeper Layers

Add code
Mar 26, 2024
Viaarxiv icon

Effective pruning of web-scale datasets based on complexity of concept clusters

Add code
Jan 09, 2024
Viaarxiv icon

Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data

Add code
Dec 05, 2023
Viaarxiv icon

D4: Improving LLM Pretraining via Document De-Duplication and Diversification

Add code
Aug 23, 2023
Figure 1 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Figure 2 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Figure 3 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Figure 4 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Viaarxiv icon

SemDeDup: Data-efficient learning at web-scale through semantic deduplication

Add code
Mar 22, 2023
Figure 1 for SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Figure 2 for SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Figure 3 for SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Figure 4 for SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Viaarxiv icon

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Add code
May 22, 2022
Figure 1 for Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Figure 2 for Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Figure 3 for Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Figure 4 for Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Viaarxiv icon