Picture for Hongchao Du

Hongchao Du

FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference

Add code
Mar 04, 2025
Viaarxiv icon

When Compression Meets Model Compression: Memory-Efficient Double Compression for Large Language Models

Add code
Feb 21, 2025
Viaarxiv icon

EvoP: Robust LLM Inference via Evolutionary Pruning

Add code
Feb 19, 2025
Viaarxiv icon

On the Compressibility of Quantized Large Language Models

Add code
Mar 03, 2024
Viaarxiv icon