Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards

May 10, 2022

Youngeun Kwon, Minsoo Rhu

Figure 1 for Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards

Figure 2 for Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards

Figure 3 for Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards

Figure 4 for Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards

Share this with someone who'll enjoy it:

Abstract:Personalized recommendation models (RecSys) are one of the most popular machine learning workload serviced by hyperscalers. A critical challenge of training RecSys is its high memory capacity requirements, reaching hundreds of GBs to TBs of model size. In RecSys, the so-called embedding layers account for the majority of memory usage so current systems employ a hybrid CPU-GPU design to have the large CPU memory store the memory hungry embedding layers. Unfortunately, training embeddings involve several memory bandwidth intensive operations which is at odds with the slow CPU memory, causing performance overheads. Prior work proposed to cache frequently accessed embeddings inside GPU memory as means to filter down the embedding layer traffic to CPU memory, but this paper observes several limitations with such cache design. In this work, we present a fundamentally different approach in designing embedding caches for RecSys. Our proposed ScratchPipe architecture utilizes unique properties of RecSys training to develop an embedding cache that not only sees the past but also the "future" cache accesses. ScratchPipe exploits such property to guarantee that the active working set of embedding layers can "always" be captured inside our proposed cache design, enabling embedding layer training to be conducted at GPU memory speed.

* Accepted for publication at the 49th IEEE/ACM International Symposium on Computer Architecture (ISCA-49), 2022

View paper on

Share this with someone who'll enjoy it:

Title:Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards

Paper and Code