Picture for Leyang Xue

Leyang Xue

On Harnessing Idle Compute at the Edge for Foundation Model Training

Add code
Dec 13, 2025
Viaarxiv icon

Towards Decentralized and Sustainable Foundation Model Training with the Edge

Add code
Jul 02, 2025
Viaarxiv icon

HybridServe: Efficient Serving of Large AI Models with Confidence-Based Cascade Routing

Add code
May 18, 2025
Figure 1 for HybridServe: Efficient Serving of Large AI Models with Confidence-Based Cascade Routing
Figure 2 for HybridServe: Efficient Serving of Large AI Models with Confidence-Based Cascade Routing
Figure 3 for HybridServe: Efficient Serving of Large AI Models with Confidence-Based Cascade Routing
Figure 4 for HybridServe: Efficient Serving of Large AI Models with Confidence-Based Cascade Routing
Viaarxiv icon

MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

Add code
May 16, 2025
Viaarxiv icon

MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching

Add code
Mar 12, 2025
Viaarxiv icon

MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems

Add code
Dec 10, 2024
Viaarxiv icon

MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving

Add code
Jan 25, 2024
Figure 1 for MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Figure 2 for MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Figure 3 for MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Figure 4 for MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Viaarxiv icon

ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models

Add code
Jan 25, 2024
Viaarxiv icon