Picture for Xiuhong Li

Xiuhong Li

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

Add code
Jun 28, 2024
Viaarxiv icon

Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

Add code
Jun 17, 2024
Viaarxiv icon

SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models

Add code
May 10, 2024
Viaarxiv icon

A Survey on Efficient Inference for Large Language Models

Add code
Apr 22, 2024
Viaarxiv icon

FlashDecoding++: Faster Large Language Model Inference on GPUs

Add code
Nov 10, 2023
Viaarxiv icon

Exploring Multimodal Sentiment Analysis via CBAM Attention and Double-layer BiLSTM Architecture

Add code
Mar 26, 2023
Viaarxiv icon