Picture for Shiyi Cao

Shiyi Cao

NVILA: Efficient Frontier Visual Language Models

Add code
Dec 05, 2024
Figure 1 for NVILA: Efficient Frontier Visual Language Models
Figure 2 for NVILA: Efficient Frontier Visual Language Models
Figure 3 for NVILA: Efficient Frontier Visual Language Models
Figure 4 for NVILA: Efficient Frontier Visual Language Models
Viaarxiv icon

MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs

Add code
Nov 18, 2024
Viaarxiv icon

NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference

Add code
Nov 02, 2024
Figure 1 for NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
Figure 2 for NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
Figure 3 for NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
Figure 4 for NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
Viaarxiv icon

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Add code
Jun 24, 2024
Figure 1 for GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Figure 2 for GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Figure 3 for GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Figure 4 for GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Viaarxiv icon

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Add code
Jun 06, 2024
Viaarxiv icon

Optimizing LLM Queries in Relational Workloads

Add code
Mar 09, 2024
Viaarxiv icon

Fairness in Serving Large Language Models

Add code
Dec 31, 2023
Figure 1 for Fairness in Serving Large Language Models
Figure 2 for Fairness in Serving Large Language Models
Figure 3 for Fairness in Serving Large Language Models
Figure 4 for Fairness in Serving Large Language Models
Viaarxiv icon

Efficiently Programming Large Language Models using SGLang

Add code
Dec 12, 2023
Viaarxiv icon

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Add code
Nov 07, 2023
Figure 1 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Figure 2 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Figure 3 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Figure 4 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Viaarxiv icon