Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mark Wilkening

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

May 22, 2021

Udit Gupta, Samuel Hsia, Jeff Zhang, Mark Wilkening, Javin Pombra, Hsien-Hsin S. Lee, Gu-Yeon Wei, Carole-Jean Wu, David Brooks

Figure 1 for RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

Figure 2 for RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

Figure 3 for RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

Figure 4 for RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

Abstract:Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and high system loads. This paper presents RecPipe, a system to jointly optimize recommendation quality and inference performance. Central to RecPipe is decomposing recommendation models into multi-stage pipelines to maintain quality while reducing compute complexity and exposing distinct parallelism opportunities. RecPipe implements an inference scheduler to map multi-stage recommendation engines onto commodity, heterogeneous platforms (e.g., CPUs, GPUs).While the hardware-aware scheduling improves ranking efficiency, the commodity platforms suffer from many limitations requiring specialized hardware. Thus, we design RecPipeAccel (RPAccel), a custom accelerator that jointly optimizes quality, tail-latency, and system throughput. RPAc-cel is designed specifically to exploit the distinct design space opened via RecPipe. In particular, RPAccel processes queries in sub-batches to pipeline recommendation stages, implements dual static and dynamic embedding caches, a set of top-k filtering units, and a reconfigurable systolic array. Com-pared to prior-art and at iso-quality, we demonstrate that RPAccel improves latency and throughput by 3x and 6x.

Via

Access Paper or Ask Questions

RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference

Jan 29, 2021

Mark Wilkening, Udit Gupta, Samuel Hsia, Caroline Trippel, Carole-Jean Wu, David Brooks, Gu-Yeon Wei

Figure 1 for RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference

Figure 2 for RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference

Figure 3 for RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference

Figure 4 for RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference

Abstract:Neural personalized recommendation models are used across a wide variety of datacenter applications including search, social media, and entertainment. State-of-the-art models comprise large embedding tables that have billions of parameters requiring large memory capacities. Unfortunately, large and fast DRAM-based memories levy high infrastructure costs. Conventional SSD-based storage solutions offer an order of magnitude larger capacity, but have worse read latency and bandwidth, degrading inference performance. RecSSD is a near data processing based SSD memory system customized for neural recommendation inference that reduces end-to-end model inference latency by 2X compared to using COTS SSDs across eight industry-representative models.

Via

Access Paper or Ask Questions