Picture for Kushal Tirumala

Kushal Tirumala

CAT: Content-Adaptive Image Tokenization

Add code
Jan 06, 2025
Viaarxiv icon

When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization

Add code
Dec 20, 2024
Viaarxiv icon

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Add code
Aug 20, 2024
Viaarxiv icon

Brevity is the soul of wit: Pruning long files for code generation

Add code
Jun 29, 2024
Figure 1 for Brevity is the soul of wit: Pruning long files for code generation
Figure 2 for Brevity is the soul of wit: Pruning long files for code generation
Figure 3 for Brevity is the soul of wit: Pruning long files for code generation
Figure 4 for Brevity is the soul of wit: Pruning long files for code generation
Viaarxiv icon

An Introduction to Vision-Language Modeling

Add code
May 27, 2024
Figure 1 for An Introduction to Vision-Language Modeling
Figure 2 for An Introduction to Vision-Language Modeling
Figure 3 for An Introduction to Vision-Language Modeling
Viaarxiv icon

Text Quality-Based Pruning for Efficient Training of Language Models

Add code
Apr 26, 2024
Viaarxiv icon

The Unreasonable Ineffectiveness of the Deeper Layers

Add code
Mar 26, 2024
Viaarxiv icon

Effective pruning of web-scale datasets based on complexity of concept clusters

Add code
Jan 09, 2024
Viaarxiv icon

Decoding Data Quality via Synthetic Corruptions: Embedding-guided Pruning of Code Data

Add code
Dec 05, 2023
Viaarxiv icon

D4: Improving LLM Pretraining via Document De-Duplication and Diversification

Add code
Aug 23, 2023
Figure 1 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Figure 2 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Figure 3 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Figure 4 for D4: Improving LLM Pretraining via Document De-Duplication and Diversification
Viaarxiv icon