Picture for Beidi Chen

Beidi Chen

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Add code
Feb 18, 2025
Viaarxiv icon

APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding

Add code
Feb 08, 2025
Viaarxiv icon

GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?

Add code
Feb 07, 2025
Viaarxiv icon

Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation

Add code
Feb 05, 2025
Viaarxiv icon

S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity

Add code
Dec 10, 2024
Figure 1 for S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity
Figure 2 for S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity
Figure 3 for S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity
Figure 4 for S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity
Viaarxiv icon

On the Surprising Effectiveness of Attention Transfer for Vision Transformers

Add code
Nov 14, 2024
Figure 1 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Figure 2 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Figure 3 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Figure 4 for On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Viaarxiv icon

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Add code
Oct 28, 2024
Viaarxiv icon

MagicPIG: LSH Sampling for Efficient LLM Generation

Add code
Oct 21, 2024
Figure 1 for MagicPIG: LSH Sampling for Efficient LLM Generation
Figure 2 for MagicPIG: LSH Sampling for Efficient LLM Generation
Figure 3 for MagicPIG: LSH Sampling for Efficient LLM Generation
Figure 4 for MagicPIG: LSH Sampling for Efficient LLM Generation
Viaarxiv icon

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

Add code
Oct 07, 2024
Figure 1 for Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild
Figure 2 for Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild
Figure 3 for Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild
Figure 4 for Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild
Viaarxiv icon

Sirius: Contextual Sparsity with Correction for Efficient LLMs

Add code
Sep 05, 2024
Figure 1 for Sirius: Contextual Sparsity with Correction for Efficient LLMs
Figure 2 for Sirius: Contextual Sparsity with Correction for Efficient LLMs
Figure 3 for Sirius: Contextual Sparsity with Correction for Efficient LLMs
Figure 4 for Sirius: Contextual Sparsity with Correction for Efficient LLMs
Viaarxiv icon