Picture for Ying Sheng

Ying Sheng

MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs

Add code
Nov 18, 2024
Viaarxiv icon

AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution

Add code
Nov 05, 2024
Viaarxiv icon

Post-Training Sparse Attention with Double Sparsity

Add code
Aug 11, 2024
Viaarxiv icon

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

Add code
Jun 20, 2024
Viaarxiv icon

DafnyBench: A Benchmark for Formal Software Verification

Add code
Jun 12, 2024
Viaarxiv icon

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Add code
Mar 07, 2024
Figure 1 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Figure 2 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Figure 3 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Figure 4 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Viaarxiv icon

Fairness in Serving Large Language Models

Add code
Dec 31, 2023
Figure 1 for Fairness in Serving Large Language Models
Figure 2 for Fairness in Serving Large Language Models
Figure 3 for Fairness in Serving Large Language Models
Figure 4 for Fairness in Serving Large Language Models
Viaarxiv icon

Efficiently Programming Large Language Models using SGLang

Add code
Dec 12, 2023
Viaarxiv icon

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Add code
Nov 07, 2023
Figure 1 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Figure 2 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Figure 3 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Figure 4 for S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Viaarxiv icon

Clover: Closed-Loop Verifiable Code Generation

Add code
Oct 26, 2023
Viaarxiv icon