Picture for Yujeong Choi

Yujeong Choi

PREBA: A Hardware/Software Co-Design for Multi-Instance GPU based AI Inference Servers

Add code
Nov 28, 2024
Viaarxiv icon

ElasticRec: A Microservice-based Model Serving Architecture Enabling Elastic Resource Scaling for Recommendation Models

Add code
Jun 11, 2024
Viaarxiv icon

Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations

Add code
Feb 23, 2023
Viaarxiv icon

PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers

Add code
Feb 27, 2022
Figure 1 for PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Figure 2 for PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Figure 3 for PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Figure 4 for PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers
Viaarxiv icon

LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference

Add code
Oct 25, 2020
Figure 1 for LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Figure 2 for LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Figure 3 for LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Figure 4 for LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference
Viaarxiv icon

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

Add code
Nov 15, 2019
Figure 1 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Figure 2 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Figure 3 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Figure 4 for NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Viaarxiv icon

PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units

Add code
Sep 06, 2019
Figure 1 for PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units
Figure 2 for PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units
Figure 3 for PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units
Figure 4 for PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units
Viaarxiv icon