Picture for Peizhuang Cong

Peizhuang Cong

BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching

Add code
Oct 24, 2024
Viaarxiv icon

INT-FlashAttention: Enabling Flash Attention for INT8 Quantization

Add code
Sep 26, 2024
Figure 1 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Figure 2 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Figure 3 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Figure 4 for INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Viaarxiv icon

Prediction Is All MoE Needs: Expert Load Distribution Goes from Fluctuating to Stabilizing

Add code
Apr 25, 2024
Viaarxiv icon