Picture for Baris Kasikci

Baris Kasikci

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Add code
Jun 16, 2024
Figure 1 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Figure 2 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Figure 3 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Figure 4 for Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Viaarxiv icon

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

Add code
Feb 10, 2024
Viaarxiv icon

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Add code
Nov 07, 2023
Viaarxiv icon