Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Add code
Jan 19, 2024
Figure 1 for Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Figure 2 for Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Figure 3 for Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Figure 4 for Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: