Abstract:On-device machine learning (ODML) enables powerful edge applications, but power consumption remains a key challenge for resource-constrained devices. To address this, developers often face a trade-off between model accuracy and power consumption, employing either computationally intensive models on high-power cores or pared-down models on low-power cores. Both approaches typically lead to a compromise in user experience (UX). This work focuses on the use of Gated Compression (GC) layer to enhance ODML model performance while conserving power and maximizing cost-efficiency, especially for always-on use cases. GC layers dynamically regulate data flow by selectively gating activations of neurons within the neural network and effectively filtering out non-essential inputs, which reduces power needs without compromising accuracy, and enables more efficient execution on heterogeneous compute cores. These improvements enhance UX through prolonged battery life, improved device responsiveness, and greater user comfort. In this work, we have integrated GC layers into vision and speech domain models including the transformer-based ViT model. Our experiments demonstrate theoretical power efficiency gains ranging from 158x to 30,000x for always-on scenarios. This substantial improvement empowers ODML applications with enhanced UX benefits.
Abstract:On-device machine learning (ODML) enables intelligent applications on resource-constrained devices. However, power consumption poses a major challenge, forcing a trade-off between model accuracy and power efficiency that often limits model complexity. The previously established Gated Compression (GC) layers offer a solution, enabling power efficiency without sacrificing model performance by selectively gating samples that lack signals of interest. However, their reliance on ground truth labels limits GC layers to supervised tasks. This work introduces the Dynamic Switch Layer (DSL), extending the benefits of GC layers to unsupervised learning scenarios, and maintaining power efficiency without the need for labeled data. The DSL builds upon the GC architecture, leveraging a dynamic pathway selection, and adapting model complexity in response to the innate structure of the data. We integrate the DSL into the SoundStream architecture and demonstrate that by routing up to 80% of samples through a lightweight pass we achieve a 12.3x reduction in the amount of computation performed and a 20.9x reduction in model size. This reduces the on-device inference latency by up to 26.5% and improves power efficiency by up to 21.4% without impacting model performance.
Abstract:Positron Emission Tomography scan images are extensively used in radiotherapy planning, clinical diagnosis, assessment of growth and treatment of a tumor. These all rely on fidelity and speed of detection and delineation algorithm. Despite intensive research, segmentation remained a challenging problem due to the diverse image content, resolution, shape, and noise. This paper presents a fast positron emission tomography tumor segmentation method in which superpixels are extracted first from the input image. Principal component analysis is then applied on the superpixels and also on their average. Distance vector of each superpixel from the average is computed in principal components coordinate system. Finally, k-means clustering is applied on distance vector to recognize tumor and non-tumor superpixels. The proposed approach is implemented in MATLAB 2016 which resulted in an average Dice similarity of 84.2% on the dataset. Additionally, a very fast execution time was achieved as the number of superpixels and the size of distance vector on which clustering was done was very small compared to the number of raw pixels in dataset images.
Abstract:In the emerging advancement in the branch of autonomous robotics, the ability of a robot to efficiently localize and construct maps of its surrounding is crucial. This paper deals with utilizing thermal-infrared cameras, as opposed to conventional cameras as the primary sensor to capture images of the robot's surroundings. For localization, the images need to be further processed before feeding them to a navigational system. The main motivation of this paper was to develop an edge detection methodology capable of utilizing the low-SNR poor output from such a thermal camera and effectively detect smooth edges of the surrounding environment. The enhanced edge detector proposed in this paper takes the raw image from the thermal sensor, denoises the images, applies Canny edge detection followed by CSS method. The edges are ranked to remove any noise and only edges of the highest rank are kept. Then, the broken edges are linked by computing edge metrics and a smooth edge of the surrounding is displayed in a binary image. Several comparisons are also made in the paper between the proposed technique and the existing techniques.