Picture for Qingan Li

Qingan Li

A$^2$ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization

Add code
Feb 18, 2025
Viaarxiv icon

CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification

Add code
Sep 02, 2024
Figure 1 for CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification
Figure 2 for CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification
Figure 3 for CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification
Figure 4 for CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification
Viaarxiv icon