Picture for Jiangfei Duan

Jiangfei Duan

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

Add code
Jun 28, 2024
Viaarxiv icon

Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

Add code
Jun 17, 2024
Viaarxiv icon

SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models

Add code
May 10, 2024
Viaarxiv icon

SpotServe: Serving Generative Large Language Models on Preemptible Instances

Add code
Nov 27, 2023
Figure 1 for SpotServe: Serving Generative Large Language Models on Preemptible Instances
Figure 2 for SpotServe: Serving Generative Large Language Models on Preemptible Instances
Figure 3 for SpotServe: Serving Generative Large Language Models on Preemptible Instances
Figure 4 for SpotServe: Serving Generative Large Language Models on Preemptible Instances
Viaarxiv icon