Picture for Liang Ye

Liang Ye

Flash Communication: Reducing Tensor Parallelization Bottleneck for Fast Large Language Model Inference

Add code
Dec 06, 2024
Viaarxiv icon