Picture for Li-Wen Chang

Li-Wen Chang

MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism

Add code
Apr 03, 2025
Viaarxiv icon

Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts

Add code
Feb 27, 2025
Viaarxiv icon

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Add code
Oct 28, 2024
Viaarxiv icon

NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques

Add code
Nov 13, 2019
Figure 1 for NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques
Figure 2 for NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques
Figure 3 for NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques
Figure 4 for NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques
Viaarxiv icon