Picture for Woosuk Kwon

Woosuk Kwon

Gemma 2: Improving Open Language Models at a Practical Size

Add code
Aug 02, 2024
Figure 1 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 2 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 3 for Gemma 2: Improving Open Language Models at a Practical Size
Figure 4 for Gemma 2: Improving Open Language Models at a Practical Size
Viaarxiv icon

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

Add code
Jun 20, 2024
Figure 1 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Figure 2 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Figure 3 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Figure 4 for Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Viaarxiv icon

Efficient Memory Management for Large Language Model Serving with PagedAttention

Add code
Sep 12, 2023
Viaarxiv icon

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Add code
Dec 04, 2020
Figure 1 for Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
Figure 2 for Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
Figure 3 for Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
Figure 4 for Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
Viaarxiv icon