Picture for Xupeng Miao

Xupeng Miao

Efficient Multi-Task Large Model Training via Data Heterogeneity-aware Model Management

Add code
Sep 05, 2024
Figure 1 for Efficient Multi-Task Large Model Training via Data Heterogeneity-aware Model Management
Figure 2 for Efficient Multi-Task Large Model Training via Data Heterogeneity-aware Model Management
Figure 3 for Efficient Multi-Task Large Model Training via Data Heterogeneity-aware Model Management
Figure 4 for Efficient Multi-Task Large Model Training via Data Heterogeneity-aware Model Management
Viaarxiv icon

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Add code
Jun 24, 2024
Figure 1 for GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Figure 2 for GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Figure 3 for GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Figure 4 for GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Viaarxiv icon

Optimal Kernel Orchestration for Tensor Programs with Korch

Add code
Jun 13, 2024
Figure 1 for Optimal Kernel Orchestration for Tensor Programs with Korch
Figure 2 for Optimal Kernel Orchestration for Tensor Programs with Korch
Figure 3 for Optimal Kernel Orchestration for Tensor Programs with Korch
Figure 4 for Optimal Kernel Orchestration for Tensor Programs with Korch
Viaarxiv icon

Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs

Add code
Jun 03, 2024
Figure 1 for Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs
Figure 2 for Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs
Figure 3 for Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs
Figure 4 for Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs
Viaarxiv icon

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

Add code
Feb 29, 2024
Figure 1 for FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Figure 2 for FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Figure 3 for FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Figure 4 for FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Viaarxiv icon

Generative Dense Retrieval: Memory Can Be a Burden

Add code
Jan 19, 2024
Figure 1 for Generative Dense Retrieval: Memory Can Be a Burden
Figure 2 for Generative Dense Retrieval: Memory Can Be a Burden
Figure 3 for Generative Dense Retrieval: Memory Can Be a Burden
Figure 4 for Generative Dense Retrieval: Memory Can Be a Burden
Viaarxiv icon

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models

Add code
Jan 13, 2024
Viaarxiv icon

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

Add code
Dec 23, 2023
Viaarxiv icon

SpotServe: Serving Generative Large Language Models on Preemptible Instances

Add code
Nov 27, 2023
Figure 1 for SpotServe: Serving Generative Large Language Models on Preemptible Instances
Figure 2 for SpotServe: Serving Generative Large Language Models on Preemptible Instances
Figure 3 for SpotServe: Serving Generative Large Language Models on Preemptible Instances
Figure 4 for SpotServe: Serving Generative Large Language Models on Preemptible Instances
Viaarxiv icon

Experimental Analysis of Large-scale Learnable Vector Storage Compression

Add code
Nov 27, 2023
Figure 1 for Experimental Analysis of Large-scale Learnable Vector Storage Compression
Figure 2 for Experimental Analysis of Large-scale Learnable Vector Storage Compression
Figure 3 for Experimental Analysis of Large-scale Learnable Vector Storage Compression
Figure 4 for Experimental Analysis of Large-scale Learnable Vector Storage Compression
Viaarxiv icon