Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Linyan Mei

ACCO: Automated Causal CNN Scheduling Optimizer for Real-Time Edge Accelerators

Jun 11, 2024

Jun Yin, Linyan Mei, Andre Guntoro, Marian Verhelst

Abstract:Spatio-Temporal Convolutional Neural Networks (ST-CNN) allow extending CNN capabilities from image processing to consecutive temporal-pattern recognition. Generally, state-of-the-art (SotA) ST-CNNs inflate the feature maps and weights from well-known CNN backbones to represent the additional time dimension. However, edge computing applications would suffer tremendously from such large computation or memory overhead. Fortunately, the overlapping nature of ST-CNN enables various optimizations, such as the dilated causal convolution structure and Depth-First (DF) layer fusion to reuse the computation between time steps and CNN sliding windows, respectively. Yet, no hardware-aware approach has been proposed that jointly explores the optimal strategy from a scheduling as well as a hardware point of view. To this end, we present ACCO, an automated optimizer that explores efficient Causal CNN transformation and DF scheduling for ST-CNNs on edge hardware accelerators. By cost-modeling the computation and data movement on the accelerator architecture, ACCO automatically selects the best scheduling strategy for the given hardware-algorithm target. Compared to the fixed dilated causal structure, ST-CNNs with ACCO reach an ~8.4x better Energy-Delay-Product. Meanwhile, ACCO improves ~20% in layer-fusion optimals compared to the SotA DF exploration toolchain. When jointly optimizing ST-CNN on the temporal and spatial dimension, ACCO's scheduling outcomes are on average 19x faster and 37x more energy-efficient than spatial DF schemes.

* 2023 IEEE 41st International Conference on Computer Design (ICCD), Washington, DC, USA, 2023, pp. 391-398

Via

Access Paper or Ask Questions

SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators

Apr 20, 2023

Victor J. B. Jung, Arne Symons, Linyan Mei, Marian Verhelst, Luca Benini

Figure 1 for SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators

Figure 2 for SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators

Figure 3 for SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators

Figure 4 for SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators

Abstract:To meet the growing need for computational power for DNNs, multiple specialized hardware architectures have been proposed. Each DNN layer should be mapped onto the hardware with the most efficient schedule, however, SotA schedulers struggle to consistently provide optimum schedules in a reasonable time across all DNN-HW combinations. This paper proposes SALSA, a fast dual-engine scheduler to generate optimal execution schedules for both even and uneven mapping. We introduce a new strategy, combining exhaustive search with simulated annealing to address the dynamic nature of the loop ordering design space size across layers. SALSA is extensively benchmarked against two SotA schedulers, LOMA and Timeloop on 5 different DNNs, on average SALSA finds schedules with 11.9% and 7.6% lower energy while speeding up the search by 1.7x and 24x compared to LOMA and Timeloop, respectively.

* 5 pages, 6 figures, open-source at https://github.com/ZigZag-Project/zigzag

Via

Access Paper or Ask Questions