Abstract:Vision Transformers (ViTs) excel in computer vision tasks but lack flexibility for edge devices' diverse needs. A vital issue is that ViTs pre-trained to cover a broad range of tasks are \textit{over-qualified} for edge devices that usually demand only part of a ViT's knowledge for specific tasks. Their task-specific accuracy on these edge devices is suboptimal. We discovered that small ViTs that focus on device-specific tasks can improve model accuracy and in the meantime, accelerate model inference. This paper presents NuWa, an approach that derives small ViTs from the base ViT for edge devices with specific task requirements. NuWa can transfer task-specific knowledge extracted from the base ViT into small ViTs that fully leverage constrained resources on edge devices to maximize model accuracy with inference latency assurance. Experiments with three base ViTs on three public datasets demonstrate that compared with state-of-the-art solutions, NuWa improves model accuracy by up to $\text{11.83}\%$ and accelerates model inference by 1.29$\times$ - 2.79$\times$. Code for reproduction is available at https://anonymous.4open.science/r/Task_Specific-3A5E.
Abstract:Indoor positioning is one of the core technologies of Internet of Things (IoT) and artificial intelligence (AI), and is expected to play a significant role in the upcoming era of AI. However, affected by the complexity of indoor environments, it is still highly challenging to achieve continuous and reliable indoor positioning. Currently, 5G cellular networks are being deployed worldwide, the new technologies of which have brought the approaches for improving the performance of wireless indoor positioning. In this paper, we investigate the indoor positioning under the 5G new radio (NR), which has been standardized and being commercially operated in massive markets. Specifically, a solution is proposed and a software defined receiver (SDR) is developed for indoor positioning. With our SDR indoor positioning system, the 5G NR signals are firstly sampled by universal software radio peripheral (USRP), and then, coarse synchronization is achieved via detecting the start of the synchronization signal block (SSB). Then, with the assistance of the pilots transmitted on the physical broadcasting channel (PBCH), multipath acquisition and delay tracking are sequentially carried out to estimate the time of arrival (ToA) of received signals. Furthermore, to improve the ToA ranging accuracy, the carrier phase of the first arrived path is estimated. Finally, to quantify the accuracy of our ToA estimation method, indoor field tests are carried out in an office environment, where a 5G NR base station (known as gNB) is installed for commercial use. Our test results show that, in the static test scenarios, the ToA accuracy measured by the 1-{\sigma} error interval is about 0.5 m, while in the pedestrian mobile environment, the probability of range accuracy within 0.8 m is 95%.
Abstract:Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solution that can demonstrate high efficiency on smartphone GPUs. For this, the participants were provided with a novel large-scale dataset consisting of noisy-clean image pairs captured in the wild. The runtime of all models was evaluated on the Samsung Exynos 2100 chipset with a powerful Mali GPU capable of accelerating floating-point and quantized neural networks. The proposed solutions are fully compatible with any mobile GPU and are capable of processing 480p resolution images under 40-80 ms while achieving high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.
Abstract:Tracking-by-detection has become an attractive tracking technique, which treats tracking as a category detection problem. However, the task in tracking is to search for a specific object, rather than an object category as in detection. In this paper, we propose a novel tracking framework based on exemplar detector rather than category detector. The proposed tracker is an ensemble of exemplar-based linear discriminant analysis (ELDA) detectors. Each detector is quite specific and discriminative, because it is trained by a single object instance and massive negatives. To improve its adaptivity, we update both object and background models. Experimental results on several challenging video sequences demonstrate the effectiveness and robustness of our tracking algorithm.