Abstract:The fast development of object detection techniques has attracted attention to developing efficient Deep Neural Networks (DNNs). However, the current state-of-the-art DNN models can not provide a balanced solution among accuracy, speed, and model size. This paper proposes an efficient real-time object detection framework on resource-constrained hardware devices through hardware and software co-design. The Tensor Train (TT) decomposition is proposed for compressing the YOLOv5 model. By unitizing the unique characteristics given by the TT decomposition, we develop an efficient hardware accelerator based on FPGA devices. Experimental results show that the proposed method can significantly reduce the model size and improve the execution time.
Abstract:In the field of model compression, choosing an appropriate rank for tensor decomposition is pivotal for balancing model compression rate and efficiency. However, this selection, whether done manually or through optimization-based automatic methods, often increases computational complexity. Manual rank selection lacks efficiency and scalability, often requiring extensive trial-and-error, while optimization-based automatic methods significantly increase the computational burden. To address this, we introduce a novel, automatic, and budget-aware rank selection method for efficient model compression, which employs Layer-Wise Imprinting Quantitation (LWIQ). LWIQ quantifies each layer's significance within a neural network by integrating a proxy classifier. This classifier assesses the layer's impact on overall model performance, allowing for a more informed adjustment of tensor rank. Furthermore, our approach includes a scaling factor to cater to varying computational budget constraints. This budget awareness eliminates the need for repetitive rank recalculations for different budget scenarios. Experimental results on the CIFAR-10 dataset show that our LWIQ improved by 63.2$\%$ in rank search efficiency, and the accuracy only dropped by 0.86$\%$ with 3.2x less model size on the ResNet-56 model as compared to the state-of-the-art proxy-based automatic tensor rank selection method.