Abstract:Deep neural networks (DNNs) offer plenty of challenges in executing efficient computation at edge nodes, primarily due to the huge hardware resource demands. The article proposes HYDRA, hybrid data multiplexing, and runtime layer configurable DNN accelerators to overcome the drawbacks. The work proposes a layer-multiplexed approach, which further reuses a single activation function within the execution of a single layer with improved Fused-Multiply-Accumulate (FMA). The proposed approach works in iterative mode to reuse the same hardware and execute different layers in a configurable fashion. The proposed architectures achieve reductions over 90% of power consumption and resource utilization improvements of state-of-the-art works, with 35.21 TOPSW. The proposed architecture reduces the area overhead (N-1) times required in bandwidth, AF and layer architecture. This work shows HYDRA architecture supports optimal DNN computations while improving performance on resource-constrained edge devices.
Abstract:This paper introduces a Scalable Hierarchical Aware Convolutional Neural Network (SHA-CNN) model architecture for Edge AI applications. The proposed hierarchical CNN model is meticulously crafted to strike a balance between computational efficiency and accuracy, addressing the challenges posed by resource-constrained edge devices. SHA-CNN demonstrates its efficacy by achieving accuracy comparable to state-of-the-art hierarchical models while outperforming baseline models in accuracy metrics. The key innovation lies in the model's hierarchical awareness, enabling it to discern and prioritize relevant features at multiple levels of abstraction. The proposed architecture classifies data in a hierarchical manner, facilitating a nuanced understanding of complex features within the datasets. Moreover, SHA-CNN exhibits a remarkable capacity for scalability, allowing for the seamless incorporation of new classes. This flexibility is particularly advantageous in dynamic environments where the model needs to adapt to evolving datasets and accommodate additional classes without the need for extensive retraining. Testing has been conducted on the PYNQ Z2 FPGA board to validate the proposed model. The results achieved an accuracy of 99.34%, 83.35%, and 63.66% for MNIST, CIFAR-10, and CIFAR-100 datasets, respectively. For CIFAR-100, our proposed architecture performs hierarchical classification with 10% reduced computation while compromising only 0.7% accuracy with the state-of-the-art. The adaptability of SHA-CNN to FPGA architecture underscores its potential for deployment in edge devices, where computational resources are limited. The SHA-CNN framework thus emerges as a promising advancement in the intersection of hierarchical CNNs, scalability, and FPGA-based Edge AI.
Abstract:This paper presents the Hybrid Overestimating Approximate Adder designed to enhance the performance in processing engines, specifically focused on edge AI applications. A novel Plus One Adder design is proposed as an incremental adder in the RCA chain, incorporating a Full Adder with an excess 1 alongside inputs A, B, and Cin. The design approximates outputs to 2 bit values to reduce hardware complexity and improve resource efficiency. The Plus One Adder is integrated into a dynamically reconfigurable HOAA, allowing runtime interchangeability between accurate and approximate overestimation modes. The proposed design is demonstrated for multiple applications, such as Twos complement subtraction and Rounding to even, and the Configurable Activation function, which are critical components of the Processing engine. Our approach shows 21 percent improvement in area efficiency and 33 percent reduction in power consumption, compared to state of the art designs with minimal accuracy loss. Thus, the proposed HOAA could be a promising solution for resource-constrained environments, offering ideal trade-offs between hardware efficiency vs computational accuracy.