Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL

Nov 15, 2021

Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi, Rachee Singh

Figure 1 for Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL

Figure 2 for Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL

Figure 3 for Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL

Figure 4 for Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL

Share this with someone who'll enjoy it:

Abstract:Large ML models and datasets have necessitated the use of multi-GPU systems for distributed model training. To harness the power offered by multi-GPU systems, it is critical to eliminate bottlenecks in inter-GPU communication - a problem made challenging by the heterogeneous nature of interconnects. In this work, we present TACCL, a synthesizer for collective communication primitives for large-scale multi-GPU systems. TACCL encodes a profiled topology and input size into a synthesis problem to generate optimized communication algorithms. TACCL is built on top of the standard NVIDIA Collective Communication Library (NCCL), allowing it to be a drop-in replacement for GPU communication in frameworks like PyTorch with minimal changes. TACCL generates algorithms for communication primitives like Allgather, Alltoall, and Allreduce that are up to $3\times$ faster than NCCL. Using TACCL's algorithms speeds up the end-to-end training of an internal mixture of experts model by $17\%$. By decomposing the optimization problem into parts and leveraging the symmetry in multi-GPU topologies, TACCL synthesizes collectives for up to 80-GPUs in less than 3 minutes, at least two orders of magnitude faster than other synthesis-based state-of-the-art collective communication libraries.

* 16 pages, 9 figures, includes Appendix

View paper on

Share this with someone who'll enjoy it:

Title:Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL

Paper and Code