Picture for Shiyi Cao

Shiyi Cao

NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference

Add code
Nov 02, 2024
Viaarxiv icon

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Add code
Jun 24, 2024
Viaarxiv icon

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Add code
Jun 06, 2024
Viaarxiv icon

Optimizing LLM Queries in Relational Workloads

Add code
Mar 09, 2024
Viaarxiv icon

Fairness in Serving Large Language Models

Add code
Dec 31, 2023
Viaarxiv icon

Efficiently Programming Large Language Models using SGLang

Add code
Dec 12, 2023
Viaarxiv icon

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Add code
Nov 07, 2023
Figure 1 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Figure 2 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Figure 3 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Figure 4 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Viaarxiv icon