Picture for Mohamed S. Abdelfattah

Mohamed S. Abdelfattah

xKV: Cross-Layer SVD for KV-Cache Compression

Add code
Mar 24, 2025
Viaarxiv icon

TokenButler: Token Importance is Predictable

Add code
Mar 10, 2025
Viaarxiv icon

SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs

Add code
Feb 18, 2025
Viaarxiv icon

The Power of Negative Zero: Datatype Customization for Quantized Large Language Models

Add code
Jan 06, 2025
Figure 1 for The Power of Negative Zero: Datatype Customization for Quantized Large Language Models
Figure 2 for The Power of Negative Zero: Datatype Customization for Quantized Large Language Models
Figure 3 for The Power of Negative Zero: Datatype Customization for Quantized Large Language Models
Figure 4 for The Power of Negative Zero: Datatype Customization for Quantized Large Language Models
Viaarxiv icon

NITRO: LLM Inference on Intel Laptop NPUs

Add code
Dec 15, 2024
Figure 1 for NITRO: LLM Inference on Intel Laptop NPUs
Figure 2 for NITRO: LLM Inference on Intel Laptop NPUs
Figure 3 for NITRO: LLM Inference on Intel Laptop NPUs
Figure 4 for NITRO: LLM Inference on Intel Laptop NPUs
Viaarxiv icon

Attamba: Attending To Multi-Token States

Add code
Nov 26, 2024
Figure 1 for Attamba: Attending To Multi-Token States
Figure 2 for Attamba: Attending To Multi-Token States
Figure 3 for Attamba: Attending To Multi-Token States
Figure 4 for Attamba: Attending To Multi-Token States
Viaarxiv icon

BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration

Add code
Nov 18, 2024
Figure 1 for BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Figure 2 for BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Figure 3 for BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Figure 4 for BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Viaarxiv icon

BBS: Bi-directional Bit-level Sparsity for Deep Learning Acceleration

Add code
Sep 08, 2024
Figure 1 for BBS: Bi-directional Bit-level Sparsity for Deep Learning Acceleration
Figure 2 for BBS: Bi-directional Bit-level Sparsity for Deep Learning Acceleration
Figure 3 for BBS: Bi-directional Bit-level Sparsity for Deep Learning Acceleration
Figure 4 for BBS: Bi-directional Bit-level Sparsity for Deep Learning Acceleration
Viaarxiv icon

Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs

Add code
May 06, 2024
Figure 1 for Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
Figure 2 for Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
Figure 3 for Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
Figure 4 for Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
Viaarxiv icon

Encodings for Prediction-based Neural Architecture Search

Add code
Mar 04, 2024
Viaarxiv icon