Picture for Jason Ramapuram

Jason Ramapuram

The Design Space of Tri-Modal Masked Diffusion Models

Add code
Feb 25, 2026
Viaarxiv icon

A Small-Scale System for Autoregressive Program Synthesis Enabling Controlled Experimentation

Add code
Feb 09, 2026
Viaarxiv icon

Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration

Add code
Dec 26, 2025
Viaarxiv icon

Learning Unmasking Policies for Diffusion Language Models

Add code
Dec 12, 2025
Viaarxiv icon

Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training

Add code
Dec 09, 2025
Viaarxiv icon

Distillation Scaling Laws

Add code
Feb 12, 2025
Viaarxiv icon

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Add code
Sep 06, 2024
Figure 1 for Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Figure 2 for Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Figure 3 for Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Figure 4 for Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Viaarxiv icon

Poly-View Contrastive Learning

Add code
Mar 08, 2024
Figure 1 for Poly-View Contrastive Learning
Figure 2 for Poly-View Contrastive Learning
Figure 3 for Poly-View Contrastive Learning
Figure 4 for Poly-View Contrastive Learning
Viaarxiv icon

Bootstrap Your Own Variance

Add code
Dec 06, 2023
Figure 1 for Bootstrap Your Own Variance
Figure 2 for Bootstrap Your Own Variance
Figure 3 for Bootstrap Your Own Variance
Figure 4 for Bootstrap Your Own Variance
Viaarxiv icon

How to Scale Your EMA

Add code
Jul 27, 2023
Figure 1 for How to Scale Your EMA
Figure 2 for How to Scale Your EMA
Figure 3 for How to Scale Your EMA
Figure 4 for How to Scale Your EMA
Viaarxiv icon