Picture for Coleman Hooper

Coleman Hooper

XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

Add code
Aug 14, 2025
Viaarxiv icon

Multipole Attention for Efficient Long Context Reasoning

Add code
Jun 16, 2025
Viaarxiv icon

FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference

Add code
Apr 19, 2025
Viaarxiv icon

ETS: Efficient Tree Search for Inference-Time Scaling

Add code
Feb 19, 2025
Viaarxiv icon

Squeezed Attention: Accelerating Long Context Length LLM Inference

Add code
Nov 14, 2024
Figure 1 for Squeezed Attention: Accelerating Long Context Length LLM Inference
Figure 2 for Squeezed Attention: Accelerating Long Context Length LLM Inference
Figure 3 for Squeezed Attention: Accelerating Long Context Length LLM Inference
Figure 4 for Squeezed Attention: Accelerating Long Context Length LLM Inference
Viaarxiv icon

TinyAgent: Function Calling at the Edge

Add code
Sep 01, 2024
Figure 1 for TinyAgent: Function Calling at the Edge
Figure 2 for TinyAgent: Function Calling at the Edge
Figure 3 for TinyAgent: Function Calling at the Edge
Figure 4 for TinyAgent: Function Calling at the Edge
Viaarxiv icon

AI and Memory Wall

Add code
Mar 21, 2024
Figure 1 for AI and Memory Wall
Figure 2 for AI and Memory Wall
Figure 3 for AI and Memory Wall
Figure 4 for AI and Memory Wall
Viaarxiv icon

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Add code
Feb 07, 2024
Viaarxiv icon

Learned Best-Effort LLM Serving

Add code
Jan 15, 2024
Figure 1 for Learned Best-Effort LLM Serving
Figure 2 for Learned Best-Effort LLM Serving
Figure 3 for Learned Best-Effort LLM Serving
Figure 4 for Learned Best-Effort LLM Serving
Viaarxiv icon

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Add code
Nov 07, 2023
Figure 1 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Figure 2 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Figure 3 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Figure 4 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Viaarxiv icon