Picture for Zihao Zeng

Zihao Zeng

ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference

Add code
Oct 23, 2024
Viaarxiv icon

MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection

Add code
Oct 16, 2024
Figure 1 for MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Figure 2 for MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Figure 3 for MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Figure 4 for MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Viaarxiv icon

In-context KV-Cache Eviction for LLMs via Attention-Gate

Add code
Oct 15, 2024
Viaarxiv icon

AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models

Add code
Jun 19, 2024
Viaarxiv icon