Picture for Íñigo Goiri

Íñigo Goiri

Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations

Add code
Sep 25, 2024
Figure 1 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Figure 2 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Figure 3 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Figure 4 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Viaarxiv icon

DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency

Add code
Aug 01, 2024
Figure 1 for DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Figure 2 for DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Figure 3 for DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Figure 4 for DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Viaarxiv icon

POLCA: Power Oversubscription in LLM Cloud Providers

Add code
Aug 24, 2023
Figure 1 for POLCA: Power Oversubscription in LLM Cloud Providers
Figure 2 for POLCA: Power Oversubscription in LLM Cloud Providers
Figure 3 for POLCA: Power Oversubscription in LLM Cloud Providers
Figure 4 for POLCA: Power Oversubscription in LLM Cloud Providers
Viaarxiv icon