Picture for Fangcheng Fu

Fangcheng Fu

Scheduling LLM Inference with Uncertainty-Aware Output Length Predictions

Add code
Apr 01, 2026
Viaarxiv icon

DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models

Add code
Mar 27, 2026
Viaarxiv icon

LAER-MoE: Load-Adaptive Expert Re-layout for Efficient Mixture-of-Experts Training

Add code
Feb 12, 2026
Viaarxiv icon

SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling

Add code
May 30, 2025
Figure 1 for SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling
Figure 2 for SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling
Figure 3 for SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling
Figure 4 for SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling
Viaarxiv icon

Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and Accurately

Add code
May 19, 2025
Viaarxiv icon

Galvatron: An Automatic Distributed System for Efficient Foundation Model Training

Add code
Apr 30, 2025
Figure 1 for Galvatron: An Automatic Distributed System for Efficient Foundation Model Training
Figure 2 for Galvatron: An Automatic Distributed System for Efficient Foundation Model Training
Viaarxiv icon

ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs

Add code
Feb 28, 2025
Viaarxiv icon

Training-free and Adaptive Sparse Attention for Efficient Long Video Generation

Add code
Feb 28, 2025
Figure 1 for Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
Figure 2 for Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
Figure 3 for Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
Figure 4 for Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
Viaarxiv icon

Demystifying Workload Imbalances in Large Transformer Model Training over Variable-length Sequences

Add code
Dec 10, 2024
Figure 1 for Demystifying Workload Imbalances in Large Transformer Model Training over Variable-length Sequences
Figure 2 for Demystifying Workload Imbalances in Large Transformer Model Training over Variable-length Sequences
Figure 3 for Demystifying Workload Imbalances in Large Transformer Model Training over Variable-length Sequences
Figure 4 for Demystifying Workload Imbalances in Large Transformer Model Training over Variable-length Sequences
Viaarxiv icon

Data-Centric and Heterogeneity-Adaptive Sequence Parallelism for Efficient LLM Training

Add code
Dec 02, 2024
Figure 1 for Data-Centric and Heterogeneity-Adaptive Sequence Parallelism for Efficient LLM Training
Figure 2 for Data-Centric and Heterogeneity-Adaptive Sequence Parallelism for Efficient LLM Training
Figure 3 for Data-Centric and Heterogeneity-Adaptive Sequence Parallelism for Efficient LLM Training
Figure 4 for Data-Centric and Heterogeneity-Adaptive Sequence Parallelism for Efficient LLM Training
Viaarxiv icon