Picture for Yash Akhauri

Yash Akhauri

Regression Language Models for Code

Add code
Sep 30, 2025
Viaarxiv icon

Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion

Add code
May 27, 2025
Viaarxiv icon

SplitReason: Learning To Offload Reasoning

Add code
Apr 23, 2025
Viaarxiv icon

xKV: Cross-Layer SVD for KV-Cache Compression

Add code
Mar 24, 2025
Viaarxiv icon

TokenButler: Token Importance is Predictable

Add code
Mar 10, 2025
Viaarxiv icon

SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs

Add code
Feb 18, 2025
Viaarxiv icon

The Power of Negative Zero: Datatype Customization for Quantized Large Language Models

Add code
Jan 06, 2025
Figure 1 for The Power of Negative Zero: Datatype Customization for Quantized Large Language Models
Figure 2 for The Power of Negative Zero: Datatype Customization for Quantized Large Language Models
Figure 3 for The Power of Negative Zero: Datatype Customization for Quantized Large Language Models
Figure 4 for The Power of Negative Zero: Datatype Customization for Quantized Large Language Models
Viaarxiv icon

Attamba: Attending To Multi-Token States

Add code
Nov 26, 2024
Figure 1 for Attamba: Attending To Multi-Token States
Figure 2 for Attamba: Attending To Multi-Token States
Figure 3 for Attamba: Attending To Multi-Token States
Figure 4 for Attamba: Attending To Multi-Token States
Viaarxiv icon

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

Add code
Jun 24, 2024
Figure 1 for ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Figure 2 for ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Figure 3 for ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Figure 4 for ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Viaarxiv icon

Radial Networks: Dynamic Layer Routing for High-Performance Large Language Models

Add code
Apr 07, 2024
Viaarxiv icon