Picture for Coleman Hooper

Coleman Hooper

ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs

Add code
Oct 06, 2025
Viaarxiv icon

XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

Add code
Aug 14, 2025
Viaarxiv icon

Multipole Attention for Efficient Long Context Reasoning

Add code
Jun 16, 2025
Figure 1 for Multipole Attention for Efficient Long Context Reasoning
Figure 2 for Multipole Attention for Efficient Long Context Reasoning
Figure 3 for Multipole Attention for Efficient Long Context Reasoning
Figure 4 for Multipole Attention for Efficient Long Context Reasoning
Viaarxiv icon

FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference

Add code
Apr 19, 2025
Viaarxiv icon

ETS: Efficient Tree Search for Inference-Time Scaling

Add code
Feb 19, 2025
Figure 1 for ETS: Efficient Tree Search for Inference-Time Scaling
Figure 2 for ETS: Efficient Tree Search for Inference-Time Scaling
Figure 3 for ETS: Efficient Tree Search for Inference-Time Scaling
Figure 4 for ETS: Efficient Tree Search for Inference-Time Scaling
Viaarxiv icon

Squeezed Attention: Accelerating Long Context Length LLM Inference

Add code
Nov 14, 2024
Figure 1 for Squeezed Attention: Accelerating Long Context Length LLM Inference
Figure 2 for Squeezed Attention: Accelerating Long Context Length LLM Inference
Figure 3 for Squeezed Attention: Accelerating Long Context Length LLM Inference
Figure 4 for Squeezed Attention: Accelerating Long Context Length LLM Inference
Viaarxiv icon

TinyAgent: Function Calling at the Edge

Add code
Sep 01, 2024
Figure 1 for TinyAgent: Function Calling at the Edge
Figure 2 for TinyAgent: Function Calling at the Edge
Figure 3 for TinyAgent: Function Calling at the Edge
Figure 4 for TinyAgent: Function Calling at the Edge
Viaarxiv icon

AI and Memory Wall

Add code
Mar 21, 2024
Figure 1 for AI and Memory Wall
Figure 2 for AI and Memory Wall
Figure 3 for AI and Memory Wall
Figure 4 for AI and Memory Wall
Viaarxiv icon

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Add code
Feb 07, 2024
Viaarxiv icon

Learned Best-Effort LLM Serving

Add code
Jan 15, 2024
Figure 1 for Learned Best-Effort LLM Serving
Figure 2 for Learned Best-Effort LLM Serving
Figure 3 for Learned Best-Effort LLM Serving
Figure 4 for Learned Best-Effort LLM Serving
Viaarxiv icon