Abstract:When executing a deep neural network (DNN), its model parameters are loaded into GPU memory before execution, incurring a significant GPU memory burden. There are studies that reduce GPU memory usage by exploiting CPU memory as a swap device. However, this approach is not applicable in most embedded systems with integrated GPUs where CPU and GPU share a common memory. In this regard, we present Demand Layering, which employs a fast solid-state drive (SSD) as a co-running partner of a GPU and exploits the layer-by-layer execution of DNNs. In our approach, a DNN is loaded and executed in a layer-by-layer manner, minimizing the memory usage to the order of a single layer. Also, we developed a pipeline architecture that hides most additional delays caused by the interleaved parameter loadings alongside layer executions. Our implementation shows a 96.5% memory reduction with just 14.8% delay overhead on average for representative DNNs. Furthermore, by exploiting the memory-delay tradeoff, near-zero delay overhead (under 1 ms) can be achieved with a slightly increased memory usage (still an 88.4% reduction), showing the great potential of Demand Layering.
Abstract:Cyclops, introduced in this paper, is an open research platform for everyone that wants to validate novel ideas and approaches in the area of self-driving heavy-duty vehicle platooning. The platform consists of multiple 1/14 scale semi-trailer trucks, a scale proving ground, and associated computing, communication and control modules that enable self-driving on the proving ground. A perception system for each vehicle is composed of a lidar-based object tracking system and a lane detection/control system. The former is to maintain the gap to the leading vehicle and the latter is to maintain the vehicle within the lane by steering control. The lane detection system is optimized for truck platooning where the field of view of the front-facing camera is severely limited due to a small gap to the leading vehicle. This platform is particularly amenable to validate mitigation strategies for safety-critical situations. Indeed, a simplex structure is adopted in the embedded module for testing various fail safe operations. We illustrate a scenario where camera sensor fails in the perception system but the vehicle operates at a reduced capacity to a graceful stop. Details of the Cyclops including 3D CAD designs and algorithm source codes are released for those who want to build similar testbeds.