Picture for Markus Nagel

Markus Nagel

Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Add code
Dec 02, 2024
Figure 1 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Figure 2 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Figure 3 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Figure 4 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Viaarxiv icon

Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference

Add code
Nov 27, 2024
Viaarxiv icon

Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters

Add code
Jul 22, 2024
Figure 1 for Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
Figure 2 for Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
Figure 3 for Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
Figure 4 for Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
Viaarxiv icon

Sparse High Rank Adapters

Add code
Jun 19, 2024
Figure 1 for Sparse High Rank Adapters
Figure 2 for Sparse High Rank Adapters
Figure 3 for Sparse High Rank Adapters
Figure 4 for Sparse High Rank Adapters
Viaarxiv icon

Low-Rank Quantization-Aware Training for LLMs

Add code
Jun 10, 2024
Figure 1 for Low-Rank Quantization-Aware Training for LLMs
Figure 2 for Low-Rank Quantization-Aware Training for LLMs
Figure 3 for Low-Rank Quantization-Aware Training for LLMs
Figure 4 for Low-Rank Quantization-Aware Training for LLMs
Viaarxiv icon

GPTVQ: The Blessing of Dimensionality for LLM Quantization

Add code
Feb 23, 2024
Figure 1 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Figure 2 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Figure 3 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Figure 4 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Viaarxiv icon

The LLM Surgeon

Add code
Dec 28, 2023
Viaarxiv icon

MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device

Add code
Oct 02, 2023
Viaarxiv icon

Softmax Bias Correction for Quantized Generative Models

Add code
Sep 04, 2023
Viaarxiv icon

ResQ: Residual Quantization for Video Perception

Add code
Aug 18, 2023
Figure 1 for ResQ: Residual Quantization for Video Perception
Figure 2 for ResQ: Residual Quantization for Video Perception
Figure 3 for ResQ: Residual Quantization for Video Perception
Figure 4 for ResQ: Residual Quantization for Video Perception
Viaarxiv icon