Picture for Chaojie Zhang

Chaojie Zhang

Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations

Add code
Sep 25, 2024
Figure 1 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Figure 2 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Figure 3 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Figure 4 for Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Viaarxiv icon

DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency

Add code
Aug 01, 2024
Figure 1 for DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Figure 2 for DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Figure 3 for DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Figure 4 for DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency
Viaarxiv icon

MR-Transformer: Vision Transformer for Total Knee Replacement Prediction Using Magnetic Resonance Imaging

Add code
May 05, 2024
Viaarxiv icon

Estimation of Time-to-Total Knee Replacement Surgery

Add code
Apr 29, 2024
Viaarxiv icon

Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference

Add code
Mar 29, 2024
Viaarxiv icon

POLCA: Power Oversubscription in LLM Cloud Providers

Add code
Aug 24, 2023
Figure 1 for POLCA: Power Oversubscription in LLM Cloud Providers
Figure 2 for POLCA: Power Oversubscription in LLM Cloud Providers
Figure 3 for POLCA: Power Oversubscription in LLM Cloud Providers
Figure 4 for POLCA: Power Oversubscription in LLM Cloud Providers
Viaarxiv icon