Picture for Ke Hong

Ke Hong

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

Add code
Dec 27, 2024
Viaarxiv icon

A Survey on Efficient Inference for Large Language Models

Add code
Apr 22, 2024
Viaarxiv icon

FlashDecoding++: Faster Large Language Model Inference on GPUs

Add code
Nov 10, 2023
Figure 1 for FlashDecoding++: Faster Large Language Model Inference on GPUs
Figure 2 for FlashDecoding++: Faster Large Language Model Inference on GPUs
Figure 3 for FlashDecoding++: Faster Large Language Model Inference on GPUs
Figure 4 for FlashDecoding++: Faster Large Language Model Inference on GPUs
Viaarxiv icon

Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

Add code
Jul 17, 2023
Viaarxiv icon