Picture for Leyang Xue

Leyang Xue

MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems

Add code
Dec 10, 2024
Viaarxiv icon

ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models

Add code
Jan 25, 2024
Viaarxiv icon

MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving

Add code
Jan 25, 2024
Figure 1 for MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Figure 2 for MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Figure 3 for MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Figure 4 for MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Viaarxiv icon