Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Norman A. Rink

PartIR: Composing SPMD Partitioning Strategies for Machine Learning

Jan 23, 2024

Sami Alabed, Bart Chrzaszcz, Juliana Franco, Dominik Grewe, Dougal Maclaurin, James Molloy, Tom Natan, Tamara Norman, Xiaoyue Pan, Adam Paszke(+6 more)

Figure 1 for PartIR: Composing SPMD Partitioning Strategies for Machine Learning

Figure 2 for PartIR: Composing SPMD Partitioning Strategies for Machine Learning

Figure 3 for PartIR: Composing SPMD Partitioning Strategies for Machine Learning

Figure 4 for PartIR: Composing SPMD Partitioning Strategies for Machine Learning

Abstract:Training of modern large neural networks (NN) requires a combination of parallelization strategies encompassing data, model, or optimizer sharding. When strategies increase in complexity, it becomes necessary for partitioning tools to be 1) expressive, allowing the composition of simpler strategies, and 2) predictable to estimate performance analytically. We present PartIR, our design for a NN partitioning system. PartIR is focused on an incremental approach to rewriting and is hardware-and-runtime agnostic. We present a simple but powerful API for composing sharding strategies and a simulator to validate them. The process is driven by high-level programmer-issued partitioning tactics, which can be both manual and automatic. Importantly, the tactics are specified separately from the model code, making them easy to change. We evaluate PartIR on several different models to demonstrate its predictability, expressibility, and ability to reach peak performance..

Via

Access Paper or Ask Questions

Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR

Oct 07, 2022

Sami Alabed, Dominik Grewe, Juliana Franco, Bart Chrzaszcz, Tom Natan, Tamara Norman, Norman A. Rink, Dimitrios Vytiniotis, Michael Schaarschmidt

Figure 1 for Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR

Figure 2 for Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR

Figure 3 for Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR

Figure 4 for Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR

Abstract:Large neural network models are commonly trained through a combination of advanced parallelism strategies in a single program, multiple data (SPMD) paradigm. For example, training large transformer models requires combining data, model, and pipeline partitioning; and optimizer sharding techniques. However, identifying efficient combinations for many model architectures and accelerator systems requires significant manual analysis. In this work, we present an automatic partitioner that identifies these combinations through a goal-oriented search. Our key findings are that a Monte Carlo Tree Search-based partitioner leveraging partition-specific compiler analysis directly into the search and guided goals matches expert-level strategies for various models.

Via

Access Paper or Ask Questions

Memory-efficient array redistribution through portable collective communication

Dec 02, 2021

Norman A. Rink, Adam Paszke, Dimitrios Vytiniotis, Georg Stefan Schmid

Figure 1 for Memory-efficient array redistribution through portable collective communication

Figure 2 for Memory-efficient array redistribution through portable collective communication

Figure 3 for Memory-efficient array redistribution through portable collective communication

Figure 4 for Memory-efficient array redistribution through portable collective communication

Abstract:Modern large-scale deep learning workloads highlight the need for parallel execution across many devices in order to fit model data into hardware accelerator memories. In these settings, array redistribution may be required during a computation, but can also become a bottleneck if not done efficiently. In this paper we address the problem of redistributing multi-dimensional array data in SPMD computations, the most prevalent form of parallelism in deep learning. We present a type-directed approach to synthesizing array redistributions as sequences of MPI-style collective operations. We prove formally that our synthesized redistributions are memory-efficient and perform no excessive data transfers. Array redistribution for SPMD computations using collective operations has also been implemented in the context of the XLA SPMD partitioner, a production-grade tool for partitioning programs across accelerator systems. We evaluate our approach against the XLA implementation and find that our approach delivers a geometric mean speedup of $1.22\times$, with maximum speedups as a high as $5.7\times$, while offering provable memory guarantees, making our system particularly appealing for large-scale models.

Via

Access Paper or Ask Questions