Picture for Gunho Park

Gunho Park

No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization

Add code
Feb 28, 2024
Viaarxiv icon

nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models

Add code
Jun 20, 2022
Figure 1 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Figure 2 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Figure 3 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Figure 4 for nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models
Viaarxiv icon