Picture for Yash Akhauri

Yash Akhauri

xKV: Cross-Layer SVD for KV-Cache Compression

Add code
Mar 24, 2025
Viaarxiv icon

TokenButler: Token Importance is Predictable

Add code
Mar 10, 2025
Viaarxiv icon

SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs

Add code
Feb 18, 2025
Viaarxiv icon

The Power of Negative Zero: Datatype Customization for Quantized Large Language Models

Add code
Jan 06, 2025
Figure 1 for The Power of Negative Zero: Datatype Customization for Quantized Large Language Models
Figure 2 for The Power of Negative Zero: Datatype Customization for Quantized Large Language Models
Figure 3 for The Power of Negative Zero: Datatype Customization for Quantized Large Language Models
Figure 4 for The Power of Negative Zero: Datatype Customization for Quantized Large Language Models
Viaarxiv icon

Attamba: Attending To Multi-Token States

Add code
Nov 26, 2024
Figure 1 for Attamba: Attending To Multi-Token States
Figure 2 for Attamba: Attending To Multi-Token States
Figure 3 for Attamba: Attending To Multi-Token States
Figure 4 for Attamba: Attending To Multi-Token States
Viaarxiv icon

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

Add code
Jun 24, 2024
Figure 1 for ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Figure 2 for ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Figure 3 for ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Figure 4 for ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Viaarxiv icon

Radial Networks: Dynamic Layer Routing for High-Performance Large Language Models

Add code
Apr 07, 2024
Viaarxiv icon

Encodings for Prediction-based Neural Architecture Search

Add code
Mar 04, 2024
Viaarxiv icon

On Latency Predictors for Neural Architecture Search

Add code
Mar 04, 2024
Figure 1 for On Latency Predictors for Neural Architecture Search
Figure 2 for On Latency Predictors for Neural Architecture Search
Figure 3 for On Latency Predictors for Neural Architecture Search
Figure 4 for On Latency Predictors for Neural Architecture Search
Viaarxiv icon

Multi-Predict: Few Shot Predictors For Efficient Neural Architecture Search

Add code
Jun 04, 2023
Figure 1 for Multi-Predict: Few Shot Predictors For Efficient Neural Architecture Search
Figure 2 for Multi-Predict: Few Shot Predictors For Efficient Neural Architecture Search
Figure 3 for Multi-Predict: Few Shot Predictors For Efficient Neural Architecture Search
Figure 4 for Multi-Predict: Few Shot Predictors For Efficient Neural Architecture Search
Viaarxiv icon