Picture for Kan Zhu

Kan Zhu

TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval

Add code
Feb 28, 2025
Viaarxiv icon

Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs

Add code
Feb 17, 2025
Viaarxiv icon

BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching

Add code
Nov 25, 2024
Figure 1 for BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Figure 2 for BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Figure 3 for BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Figure 4 for BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Viaarxiv icon

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Add code
Jun 16, 2024
Figure 1 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Figure 2 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Figure 3 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Figure 4 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Viaarxiv icon

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Add code
Feb 10, 2024
Viaarxiv icon

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Add code
Nov 07, 2023
Figure 1 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 2 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 3 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 4 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Viaarxiv icon

Practical Algorithms for Learning Near-Isometric Linear Embeddings

Add code
Apr 22, 2016
Figure 1 for Practical Algorithms for Learning Near-Isometric Linear Embeddings
Figure 2 for Practical Algorithms for Learning Near-Isometric Linear Embeddings
Figure 3 for Practical Algorithms for Learning Near-Isometric Linear Embeddings
Figure 4 for Practical Algorithms for Learning Near-Isometric Linear Embeddings
Viaarxiv icon