Picture for Mehdi Rezagholizadeh

Mehdi Rezagholizadeh

Huawei Noah's Ark Lab

X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression

Add code
Mar 14, 2025
Viaarxiv icon

Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models

Add code
Mar 06, 2025
Viaarxiv icon

The Order Effect: Investigating Prompt Sensitivity in Closed-Source LLMs

Add code
Feb 06, 2025
Figure 1 for The Order Effect: Investigating Prompt Sensitivity in Closed-Source LLMs
Figure 2 for The Order Effect: Investigating Prompt Sensitivity in Closed-Source LLMs
Figure 3 for The Order Effect: Investigating Prompt Sensitivity in Closed-Source LLMs
Figure 4 for The Order Effect: Investigating Prompt Sensitivity in Closed-Source LLMs
Viaarxiv icon

ReGLA: Refining Gated Linear Attention

Add code
Feb 03, 2025
Viaarxiv icon

Batch-Max: Higher LLM Throughput using Larger Batch Sizes and KV Cache Compression

Add code
Dec 07, 2024
Viaarxiv icon

Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination

Add code
Oct 22, 2024
Figure 1 for Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Figure 2 for Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Figure 3 for Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Figure 4 for Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Viaarxiv icon

Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity

Add code
Oct 01, 2024
Viaarxiv icon

EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models

Add code
Sep 22, 2024
Figure 1 for EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models
Figure 2 for EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models
Figure 3 for EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models
Figure 4 for EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models
Viaarxiv icon

Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models

Add code
Aug 16, 2024
Viaarxiv icon

S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models

Add code
Jul 02, 2024
Figure 1 for S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models
Figure 2 for S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models
Figure 3 for S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models
Figure 4 for S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models
Viaarxiv icon