Picture for Paul Whatmough

Paul Whatmough

Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Add code
Dec 02, 2024
Figure 1 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Figure 2 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Figure 3 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Figure 4 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Viaarxiv icon

Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference

Add code
Nov 27, 2024
Viaarxiv icon

Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters

Add code
Jul 22, 2024
Figure 1 for Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
Figure 2 for Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
Figure 3 for Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
Figure 4 for Rapid Switching and Multi-Adapter Fusion via Sparse High Rank Adapters
Viaarxiv icon

Sparse High Rank Adapters

Add code
Jun 19, 2024
Figure 1 for Sparse High Rank Adapters
Figure 2 for Sparse High Rank Adapters
Figure 3 for Sparse High Rank Adapters
Figure 4 for Sparse High Rank Adapters
Viaarxiv icon

Characterizing Soft-Error Resiliency in Arm's Ethos-U55 Embedded Machine Learning Accelerator

Add code
Apr 14, 2024
Viaarxiv icon

GPTVQ: The Blessing of Dimensionality for LLM Quantization

Add code
Feb 23, 2024
Figure 1 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Figure 2 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Figure 3 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Figure 4 for GPTVQ: The Blessing of Dimensionality for LLM Quantization
Viaarxiv icon

PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices

Add code
Jan 26, 2023
Viaarxiv icon

Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators

Add code
Dec 05, 2022
Figure 1 for Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators
Figure 2 for Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators
Figure 3 for Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators
Figure 4 for Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators
Viaarxiv icon

Restructurable Activation Networks

Add code
Aug 17, 2022
Figure 1 for Restructurable Activation Networks
Figure 2 for Restructurable Activation Networks
Figure 3 for Restructurable Activation Networks
Figure 4 for Restructurable Activation Networks
Viaarxiv icon

UDC: Unified DNAS for Compressible TinyML Models

Add code
Jan 21, 2022
Figure 1 for UDC: Unified DNAS for Compressible TinyML Models
Figure 2 for UDC: Unified DNAS for Compressible TinyML Models
Figure 3 for UDC: Unified DNAS for Compressible TinyML Models
Figure 4 for UDC: Unified DNAS for Compressible TinyML Models
Viaarxiv icon