Picture for Yerui Sun

Yerui Sun

Flash Communication: Reducing Tensor Parallelization Bottleneck for Fast Large Language Model Inference

Add code
Dec 06, 2024
Viaarxiv icon

Integer Scale: A Free Lunch for Faster Fine-grained Quantization of LLMs

Add code
May 23, 2024
Viaarxiv icon

A Speed Odyssey for Deployable Quantization of LLMs

Add code
Nov 16, 2023
Viaarxiv icon

FPTQ: Fine-grained Post-Training Quantization for Large Language Models

Add code
Aug 30, 2023
Viaarxiv icon