Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Baechi: Fast Device Placement of Machine Learning Graphs

Jan 20, 2023

Beomyeol Jeon, Linda Cai, Chirag Shetty, Pallavi Srivastava, Jintao Jiang, Xiaolan Ke, Yitao Meng, Cong Xie, Indranil Gupta

Figure 1 for Baechi: Fast Device Placement of Machine Learning Graphs

Figure 2 for Baechi: Fast Device Placement of Machine Learning Graphs

Figure 3 for Baechi: Fast Device Placement of Machine Learning Graphs

Figure 4 for Baechi: Fast Device Placement of Machine Learning Graphs

Share this with someone who'll enjoy it:

Abstract:Machine Learning graphs (or models) can be challenging or impossible to train when either devices have limited memory, or models are large. To split the model across devices, learning-based approaches are still popular. While these result in model placements that train fast on data (i.e., low step times), learning-based model-parallelism is time-consuming, taking many hours or days to create a placement plan of operators on devices. We present the Baechi system, the first to adopt an algorithmic approach to the placement problem for running machine learning training graphs on small clusters of memory-constrained devices. We integrate our implementation of Baechi into two popular open-source learning frameworks: TensorFlow and PyTorch. Our experimental results using GPUs show that: (i) Baechi generates placement plans 654 X - 206K X faster than state-of-the-art learning-based approaches, and (ii) Baechi-placed model's step (training) time is comparable to expert placements in PyTorch, and only up to 6.2% worse than expert placements in TensorFlow. We prove mathematically that our two algorithms are within a constant factor of the optimal. Our work shows that compared to learning-based approaches, algorithmic approaches can face different challenges for adaptation to Machine learning systems, but also they offer proven bounds, and significant performance benefits.

* Extended version of SoCC 2020 paper: https://dl.acm.org/doi/10.1145/3419111.3421302

View paper on

Share this with someone who'll enjoy it:

Title:Baechi: Fast Device Placement of Machine Learning Graphs

Paper and Code