Picture for Xupeng Miao

Xupeng Miao

Efficient Multi-Task Large Model Training via Data Heterogeneity-aware Model Management

Add code
Sep 05, 2024
Viaarxiv icon

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Add code
Jun 24, 2024
Viaarxiv icon

Optimal Kernel Orchestration for Tensor Programs with Korch

Add code
Jun 13, 2024
Viaarxiv icon

Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs

Add code
Jun 03, 2024
Viaarxiv icon

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

Add code
Feb 29, 2024
Viaarxiv icon

Generative Dense Retrieval: Memory Can Be a Burden

Add code
Jan 19, 2024
Viaarxiv icon

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models

Add code
Jan 13, 2024
Viaarxiv icon

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

Add code
Dec 23, 2023
Viaarxiv icon

SpotServe: Serving Generative Large Language Models on Preemptible Instances

Add code
Nov 27, 2023
Viaarxiv icon

Experimental Analysis of Large-scale Learnable Vector Storage Compression

Add code
Nov 27, 2023
Viaarxiv icon