Picture for Pengle Zhang

Pengle Zhang

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

Add code
Nov 17, 2024
Figure 1 for SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Figure 2 for SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Figure 3 for SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Figure 4 for SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
Viaarxiv icon

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration

Add code
Oct 03, 2024
Viaarxiv icon

InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory

Add code
Feb 07, 2024
Viaarxiv icon

Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules

Add code
Oct 24, 2023
Viaarxiv icon