Picture for Danyang Zhuo

Danyang Zhuo

Duke University

HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs

Add code
Apr 04, 2025
Viaarxiv icon

Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement

Add code
Jul 05, 2024
Figure 1 for Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement
Figure 2 for Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement
Figure 3 for Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement
Figure 4 for Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement
Viaarxiv icon

VcLLM: Video Codecs are Secretly Tensor Codecs

Add code
Jun 29, 2024
Figure 1 for VcLLM: Video Codecs are Secretly Tensor Codecs
Figure 2 for VcLLM: Video Codecs are Secretly Tensor Codecs
Figure 3 for VcLLM: Video Codecs are Secretly Tensor Codecs
Figure 4 for VcLLM: Video Codecs are Secretly Tensor Codecs
Viaarxiv icon

Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution

Add code
May 29, 2024
Figure 1 for Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution
Figure 2 for Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution
Figure 3 for Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution
Figure 4 for Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution
Viaarxiv icon

Adaptive Skeleton Graph Decoding

Add code
Feb 19, 2024
Figure 1 for Adaptive Skeleton Graph Decoding
Figure 2 for Adaptive Skeleton Graph Decoding
Figure 3 for Adaptive Skeleton Graph Decoding
Figure 4 for Adaptive Skeleton Graph Decoding
Viaarxiv icon

Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

Add code
Jan 17, 2024
Figure 1 for Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native
Figure 2 for Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native
Viaarxiv icon

Curator: Efficient Indexing for Multi-Tenant Vector Databases

Add code
Jan 13, 2024
Viaarxiv icon

Fairness in Serving Large Language Models

Add code
Dec 31, 2023
Figure 1 for Fairness in Serving Large Language Models
Figure 2 for Fairness in Serving Large Language Models
Figure 3 for Fairness in Serving Large Language Models
Figure 4 for Fairness in Serving Large Language Models
Viaarxiv icon

Punica: Multi-Tenant LoRA Serving

Add code
Oct 28, 2023
Viaarxiv icon

Query Complexity of Active Learning for Function Family With Nearly Orthogonal Basis

Add code
Jun 06, 2023
Viaarxiv icon