Picture for Yeqi Huang

Yeqi Huang

WaferLLM: A Wafer-Scale LLM Inference System

Add code
Feb 06, 2025
Viaarxiv icon

MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems

Add code
Dec 10, 2024
Viaarxiv icon

ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models

Add code
Jan 25, 2024
Viaarxiv icon