Picture for Chengruidong Zhang

Chengruidong Zhang

SCBench: A KV Cache-Centric Analysis of Long-Context Methods

Add code
Dec 13, 2024
Viaarxiv icon

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Add code
Sep 16, 2024
Viaarxiv icon

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Add code
Jul 02, 2024
Figure 1 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Figure 2 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Figure 3 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Figure 4 for MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Viaarxiv icon

Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

Add code
May 30, 2024
Viaarxiv icon

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Add code
Apr 23, 2024
Figure 1 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 2 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 3 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Figure 4 for Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Viaarxiv icon

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Add code
Feb 21, 2024
Figure 1 for LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Figure 2 for LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Figure 3 for LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Figure 4 for LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Viaarxiv icon