Picture for Pengle Zhang

Pengle Zhang

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

Add code
Oct 03, 2024
Viaarxiv icon

InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory

Add code
Feb 07, 2024
Viaarxiv icon

Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules

Add code
Oct 24, 2023
Viaarxiv icon