Picture for Shiyao Li

Shiyao Li

CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios

Add code
Sep 16, 2024
Figure 1 for CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios
Figure 2 for CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios
Figure 3 for CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios
Figure 4 for CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios
Viaarxiv icon

GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions

Add code
Sep 12, 2024
Figure 1 for GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions
Figure 2 for GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions
Figure 3 for GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions
Figure 4 for GateAttentionPose: Enhancing Pose Estimation with Agent Attention and Improved Gated Convolutions
Viaarxiv icon

MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

Add code
Jun 21, 2024
Viaarxiv icon

Can LLMs Learn by Teaching? A Preliminary Study

Add code
Jun 20, 2024
Viaarxiv icon

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

Add code
Jun 04, 2024
Viaarxiv icon

A Survey on Efficient Inference for Large Language Models

Add code
Apr 22, 2024
Viaarxiv icon

Evaluating Quantized Large Language Models

Add code
Feb 28, 2024
Viaarxiv icon

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K

Add code
Feb 06, 2024
Viaarxiv icon

FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs

Add code
Jan 09, 2024
Figure 1 for FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs
Figure 2 for FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs
Figure 3 for FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs
Figure 4 for FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs
Viaarxiv icon

Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization

Add code
Nov 28, 2023
Viaarxiv icon