Picture for Jonathan Ragan-Kelley

Jonathan Ragan-Kelley

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding

Add code
Feb 17, 2025
Viaarxiv icon

Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

Add code
Jan 11, 2025
Figure 1 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Figure 2 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Figure 3 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Figure 4 for Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping
Viaarxiv icon

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

Add code
Jul 15, 2024
Figure 1 for Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Figure 2 for Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Figure 3 for Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Figure 4 for Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Viaarxiv icon

WatChat: Explaining perplexing programs by debugging mental models

Add code
Mar 08, 2024
Viaarxiv icon

Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding

Add code
Feb 07, 2024
Viaarxiv icon

How to guess a gradient

Add code
Dec 07, 2023
Viaarxiv icon

Striped Attention: Faster Ring Attention for Causal Transformers

Add code
Nov 15, 2023
Viaarxiv icon

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

Add code
Oct 07, 2023
Viaarxiv icon

Differentiating Metropolis-Hastings to Optimize Intractable Densities

Add code
Jun 30, 2023
Viaarxiv icon

Acting as Inverse Inverse Planning

Add code
May 26, 2023
Viaarxiv icon