Picture for Mart van Baalen

Mart van Baalen

Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Add code
Dec 02, 2024
Figure 1 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Figure 2 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Figure 3 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Figure 4 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Viaarxiv icon

Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference

Add code
Nov 27, 2024
Viaarxiv icon

GPTVQ: The Blessing of Dimensionality for LLM Quantization

Add code
Feb 23, 2024
Figure 1 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Figure 2 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Figure 3 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Figure 4 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Viaarxiv icon

The LLM Surgeon

Add code
Dec 28, 2023
Viaarxiv icon

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

Add code
Jul 10, 2023
Figure 1 for QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Figure 2 for QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Figure 3 for QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Figure 4 for QBitOpt: Fast and Accurate Bitwidth Reallocation during Training
Viaarxiv icon

Pruning vs Quantization: Which is Better?

Add code
Jul 06, 2023
Viaarxiv icon

FP8 versus INT8 for efficient deep learning inference

Add code
Mar 31, 2023
Viaarxiv icon

A Practical Mixed Precision Algorithm for Post-Training Quantization

Add code
Feb 10, 2023
Viaarxiv icon

Quantized Sparse Weight Decomposition for Neural Network Compression

Add code
Jul 22, 2022
Figure 1 for Quantized Sparse Weight Decomposition for Neural Network Compression
Figure 2 for Quantized Sparse Weight Decomposition for Neural Network Compression
Figure 3 for Quantized Sparse Weight Decomposition for Neural Network Compression
Figure 4 for Quantized Sparse Weight Decomposition for Neural Network Compression
Viaarxiv icon

Cyclical Pruning for Sparse Neural Networks

Add code
Feb 02, 2022
Figure 1 for Cyclical Pruning for Sparse Neural Networks
Figure 2 for Cyclical Pruning for Sparse Neural Networks
Figure 3 for Cyclical Pruning for Sparse Neural Networks
Figure 4 for Cyclical Pruning for Sparse Neural Networks
Viaarxiv icon