Picture for Baolin Li

Baolin Li

LLM Inference Serving: Survey of Recent Advances and Opportunities

Add code
Jul 17, 2024
Viaarxiv icon

Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference

Add code
Mar 19, 2024
Viaarxiv icon

Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale

Add code
Feb 25, 2024
Figure 1 for Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale
Figure 2 for Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale
Figure 3 for Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale
Figure 4 for Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale
Viaarxiv icon

Synergistic Signals: Exploiting Co-Engagement and Semantic Links via Graph Neural Networks

Add code
Dec 07, 2023
Viaarxiv icon

From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference

Add code
Oct 04, 2023
Viaarxiv icon

Building Heterogeneous Cloud System for Machine Learning Inference

Add code
Oct 15, 2022
Figure 1 for Building Heterogeneous Cloud System for Machine Learning Inference
Figure 2 for Building Heterogeneous Cloud System for Machine Learning Inference
Figure 3 for Building Heterogeneous Cloud System for Machine Learning Inference
Figure 4 for Building Heterogeneous Cloud System for Machine Learning Inference
Viaarxiv icon

RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances

Add code
Jul 28, 2022
Figure 1 for RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances
Figure 2 for RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances
Figure 3 for RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances
Figure 4 for RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances
Viaarxiv icon

Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models

Add code
May 19, 2022
Figure 1 for Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models
Figure 2 for Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models
Figure 3 for Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models
Figure 4 for Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models
Viaarxiv icon

The MIT Supercloud Workload Classification Challenge

Add code
Apr 13, 2022
Figure 1 for The MIT Supercloud Workload Classification Challenge
Figure 2 for The MIT Supercloud Workload Classification Challenge
Figure 3 for The MIT Supercloud Workload Classification Challenge
Figure 4 for The MIT Supercloud Workload Classification Challenge
Viaarxiv icon

Benchmarking Resource Usage for Efficient Distributed Deep Learning

Add code
Jan 28, 2022
Figure 1 for Benchmarking Resource Usage for Efficient Distributed Deep Learning
Figure 2 for Benchmarking Resource Usage for Efficient Distributed Deep Learning
Figure 3 for Benchmarking Resource Usage for Efficient Distributed Deep Learning
Figure 4 for Benchmarking Resource Usage for Efficient Distributed Deep Learning
Viaarxiv icon