Picture for Zhuohan Li

Zhuohan Li

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

Add code
Jun 20, 2024
Figure 1 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Figure 2 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Figure 3 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Figure 4 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Viaarxiv icon

Overcoming systematic softening in universal machine learning interatomic potentials by fine-tuning

Add code
May 11, 2024
Figure 1 for Overcoming systematic softening in universal machine learning interatomic potentials by fine-tuning
Figure 2 for Overcoming systematic softening in universal machine learning interatomic potentials by fine-tuning
Figure 3 for Overcoming systematic softening in universal machine learning interatomic potentials by fine-tuning
Figure 4 for Overcoming systematic softening in universal machine learning interatomic potentials by fine-tuning
Viaarxiv icon

Fairness in Serving Large Language Models

Add code
Dec 31, 2023
Figure 1 for Fairness in Serving Large Language Models
Figure 2 for Fairness in Serving Large Language Models
Figure 3 for Fairness in Serving Large Language Models
Figure 4 for Fairness in Serving Large Language Models
Viaarxiv icon

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

Add code
Sep 30, 2023
Viaarxiv icon

Efficient Memory Management for Large Language Model Serving with PagedAttention

Add code
Sep 12, 2023
Viaarxiv icon

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Add code
Jun 09, 2023
Viaarxiv icon

What is the State of Memory Saving for Model Training?

Add code
Mar 26, 2023
Viaarxiv icon

High-throughput Generative Inference of Large Language Models with a Single GPU

Add code
Mar 13, 2023
Figure 1 for High-throughput Generative Inference of Large Language Models with a Single GPU
Figure 2 for High-throughput Generative Inference of Large Language Models with a Single GPU
Figure 3 for High-throughput Generative Inference of Large Language Models with a Single GPU
Figure 4 for High-throughput Generative Inference of Large Language Models with a Single GPU
Viaarxiv icon

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

Add code
Feb 22, 2023
Viaarxiv icon

On Optimizing the Communication of Model Parallelism

Add code
Nov 10, 2022
Viaarxiv icon