Picture for Kushal Tirumala

Kushal Tirumala

CAT: Content-Adaptive Image Tokenization

Add code
Jan 06, 2025
Viaarxiv icon

When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization

Add code
Dec 20, 2024
Viaarxiv icon

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Add code
Aug 20, 2024
Figure 1 for Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Figure 2 for Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Figure 3 for Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Figure 4 for Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Viaarxiv icon

Brevity is the soul of wit: Pruning long files for code generation

Add code
Jun 29, 2024
Figure 1 for Brevity is the soul of wit: Pruning long files for code generation
Figure 2 for Brevity is the soul of wit: Pruning long files for code generation
Figure 3 for Brevity is the soul of wit: Pruning long files for code generation
Figure 4 for Brevity is the soul of wit: Pruning long files for code generation
Viaarxiv icon

An Introduction to Vision-Language Modeling

Add code
May 27, 2024
Figure 1 for An Introduction to Vision-Language Modeling
Figure 2 for An Introduction to Vision-Language Modeling
Figure 3 for An Introduction to Vision-Language Modeling
Viaarxiv icon

Text Quality-Based Pruning for Efficient Training of Language Models

Add code
Apr 26, 2024
Figure 1 for Text Quality-Based Pruning for Efficient Training of Language Models
Figure 2 for Text Quality-Based Pruning for Efficient Training of Language Models
Figure 3 for Text Quality-Based Pruning for Efficient Training of Language Models
Figure 4 for Text Quality-Based Pruning for Efficient Training of Language Models
Viaarxiv icon

The Unreasonable Ineffectiveness of the Deeper Layers

Add code
Mar 26, 2024
Viaarxiv icon

Effective pruning of web-scale datasets based on complexity of concept clusters

Add code
Jan 09, 2024
Viaarxiv icon

Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data

Add code
Dec 05, 2023
Viaarxiv icon

D4: Improving LLM Pretraining via Document De-Duplication and Diversification

Add code
Aug 23, 2023
Figure 1 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Figure 2 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Figure 3 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Figure 4 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Viaarxiv icon