Picture for Elias Frantar

Elias Frantar

MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models

Add code
Aug 21, 2024
Viaarxiv icon

Extreme Compression of Large Language Models via Additive Quantization

Add code
Jan 11, 2024
Viaarxiv icon

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Add code
Oct 25, 2023
Viaarxiv icon

Towards End-to-end 4-Bit Inference on Generative Large Language Models

Add code
Oct 13, 2023
Figure 1 for Towards End-to-end 4-Bit Inference on Generative Large Language Models
Figure 2 for Towards End-to-end 4-Bit Inference on Generative Large Language Models
Figure 3 for Towards End-to-end 4-Bit Inference on Generative Large Language Models
Figure 4 for Towards End-to-end 4-Bit Inference on Generative Large Language Models
Viaarxiv icon

Sparse Fine-tuning for Inference Acceleration of Large Language Models

Add code
Oct 13, 2023
Viaarxiv icon

Scaling Laws for Sparsely-Connected Foundation Models

Add code
Sep 15, 2023
Viaarxiv icon

Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

Add code
Aug 03, 2023
Viaarxiv icon

QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models

Add code
Jul 07, 2023
Viaarxiv icon

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Add code
Jun 05, 2023
Viaarxiv icon

JaxPruner: A concise library for sparsity research

Add code
May 02, 2023
Viaarxiv icon