Picture for Safeen Huda

Safeen Huda

HALO: Hardware-aware quantization with low critical-path-delay weights for LLM acceleration

Add code
Feb 27, 2025
Viaarxiv icon

Attamba: Attending To Multi-Token States

Add code
Nov 26, 2024
Figure 1 for Attamba: Attending To Multi-Token States
Figure 2 for Attamba: Attending To Multi-Token States
Figure 3 for Attamba: Attending To Multi-Token States
Figure 4 for Attamba: Attending To Multi-Token States
Viaarxiv icon

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

Add code
Jun 24, 2024
Figure 1 for ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Figure 2 for ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Figure 3 for ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Figure 4 for ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Viaarxiv icon

A Full-stack Accelerator Search Technique for Vision Applications

Add code
May 26, 2021
Figure 1 for A Full-stack Accelerator Search Technique for Vision Applications
Figure 2 for A Full-stack Accelerator Search Technique for Vision Applications
Figure 3 for A Full-stack Accelerator Search Technique for Vision Applications
Figure 4 for A Full-stack Accelerator Search Technique for Vision Applications
Viaarxiv icon