Picture for Gabriele Oliaro

Gabriele Oliaro

SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference

Add code
Nov 07, 2024
Viaarxiv icon

Optimal Kernel Orchestration for Tensor Programs with Korch

Add code
Jun 13, 2024
Figure 1 for Optimal Kernel Orchestration for Tensor Programs with Korch
Figure 2 for Optimal Kernel Orchestration for Tensor Programs with Korch
Figure 3 for Optimal Kernel Orchestration for Tensor Programs with Korch
Figure 4 for Optimal Kernel Orchestration for Tensor Programs with Korch
Viaarxiv icon

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

Add code
Feb 29, 2024
Figure 1 for FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Figure 2 for FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Figure 3 for FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Figure 4 for FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Viaarxiv icon

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models

Add code
Jan 13, 2024
Viaarxiv icon

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

Add code
Dec 23, 2023
Viaarxiv icon

SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification

Add code
May 16, 2023
Figure 1 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Figure 2 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Figure 3 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Figure 4 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Viaarxiv icon