Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Iakovos S. Venieris

MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference

Dec 05, 2024

Sokratis Nikolaidis, Stylianos I. Venieris, Iakovos S. Venieris

Figure 1 for MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference

Figure 2 for MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference

Figure 3 for MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference

Figure 4 for MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference

Abstract:Cascade systems, consisting of a lightweight model processing all samples and a heavier, high-accuracy model refining challenging samples, have become a widely-adopted distributed inference approach to achieving high accuracy and maintaining a low computational burden for mobile and IoT devices. As intelligent indoor environments, like smart homes, continue to expand, a new scenario emerges, the multi-device cascade. In this setting, multiple diverse devices simultaneously utilize a shared heavy model hosted on a server, often situated within or close to the consumer environment. This work introduces MultiTASC++, a continuously adaptive multi-tenancy-aware scheduler that dynamically controls the forwarding decision functions of devices to optimize system throughput while maintaining high accuracy and low latency. Through extensive experimentation in diverse device environments and with varying server-side models, we demonstrate the scheduler's efficacy in consistently maintaining a targeted satisfaction rate while providing the highest available accuracy across different device tiers and workloads of up to 100 devices. This demonstrates its scalability and efficiency in addressing the unique challenges of collaborative DNN inference in dynamic and diverse IoT environments.

* ITU Journal on Future and Evolving Technologies, Volume 5 (2024), Issue 1, Pages 26-46

Via

Access Paper or Ask Questions

CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads

Sep 02, 2024

Ioannis Panopoulos, Stylianos I. Venieris, Iakovos S. Venieris

Figure 1 for CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads

Figure 2 for CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads

Figure 3 for CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads

Figure 4 for CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads

Abstract:The relentless expansion of deep learning applications in recent years has prompted a pivotal shift toward on-device execution, driven by the urgent need for real-time processing, heightened privacy concerns, and reduced latency across diverse domains. This article addresses the challenges inherent in optimising the execution of deep neural networks (DNNs) on mobile devices, with a focus on device heterogeneity, multi-DNN execution, and dynamic runtime adaptation. We introduce CARIn, a novel framework designed for the optimised deployment of both single- and multi-DNN applications under user-defined service-level objectives. Leveraging an expressive multi-objective optimisation framework and a runtime-aware sorting and search algorithm (RASS) as the MOO solver, CARIn facilitates efficient adaptation to dynamic conditions while addressing resource contention issues associated with multi-DNN execution. Notably, RASS generates a set of configurations, anticipating subsequent runtime adaptation, ensuring rapid, low-overhead adjustments in response to environmental fluctuations. Extensive evaluation across diverse tasks, including text classification, scene recognition, and face analysis, showcases the versatility of CARIn across various model architectures, such as Convolutional Neural Networks and Transformers, and realistic use cases. We observe a substantial enhancement in the fair treatment of the problem's objectives, reaching 1.92x when compared to single-model designs and up to 10.69x in contrast to the state-of-the-art OODIn framework. Additionally, we achieve a significant gain of up to 4.06x over hardware-unaware designs in multi-DNN applications. Finally, our framework sustains its performance while effectively eliminating the time overhead associated with identifying the optimal design in response to environmental challenges.

* ACM Transactions on Embedded Computing Systems, Volume 23, Issue 4, Article 60 (July 2024), 32 pages

Via

Access Paper or Ask Questions

Near-Field Beamforming for Stacked Intelligent Metasurfaces-assisted MIMO Networks

Aug 03, 2024

Anastasios Papazafeiropoulos, Pandelis Kourtessis, Symeon Chatzinotas, Dimitra I. Kaklamani, Iakovos S. Venieris

Figure 1 for Near-Field Beamforming for Stacked Intelligent Metasurfaces-assisted MIMO Networks

Figure 2 for Near-Field Beamforming for Stacked Intelligent Metasurfaces-assisted MIMO Networks

Figure 3 for Near-Field Beamforming for Stacked Intelligent Metasurfaces-assisted MIMO Networks

Abstract:Stacked intelligent metasurfaces (SIMs) have recently gained significant interest since they enable precoding in the wave domain that comes with increased processing capability and reduced energy consumption. The study of SIMs and high frequency propagation make the study of the performance in the near field of crucial importance. Hence, in this work, we focus on SIM-assisted multiuser multiple-input multiple-output (MIMO) systems operating in the near field region. To this end, we formulate the weighted sum rate maximisation problem in terms of the transmit power and the phase shifts of the SIM. By applying a block coordinate descent (BCD)-relied algorithm, numerical results show the enhanced performance of the SIM in the near field with respect to the far field.

* 5 pages, accepted in IEEE WCL

Via

Access Paper or Ask Questions

Achievable Rate Optimization for Large Stacked Intelligent Metasurfaces Based on Statistical CSI

May 29, 2024

Anastasios Papazafeiropoulos, Pandelis Kourtessis, Symeon Chatzinotas, Dimitra I. Kaklamani, Iakovos S. Venieris

Figure 1 for Achievable Rate Optimization for Large Stacked Intelligent Metasurfaces Based on Statistical CSI

Figure 2 for Achievable Rate Optimization for Large Stacked Intelligent Metasurfaces Based on Statistical CSI

Figure 3 for Achievable Rate Optimization for Large Stacked Intelligent Metasurfaces Based on Statistical CSI

Figure 4 for Achievable Rate Optimization for Large Stacked Intelligent Metasurfaces Based on Statistical CSI

Abstract:Stacked intelligent metasurface (SIM) is an emerging design that consists of multiple layers of metasurfaces. A SIM enables holographic multiple-input multiple-output (HMIMO) precoding in the wave domain, which results in the reduction of energy consumption and hardware cost. On the ground of multiuser beamforming, this letter focuses on the downlink achievable rate and its maximization. Contrary to previous works on multiuser SIM, we consider statistical channel state information (CSI) as opposed to instantaneous CSI to overcome challenges such as large overhead. Also, we examine the performance of large surfaces. We apply an alternating optimization (AO) algorithm regarding the phases of the SIM and the allocated transmit power. Simulations illustrate the performance of the considered large SIM-assisted design as well as the comparison between different CSI considerations.

* accepted in IEEE Wireless Communications Letters

Via

Access Paper or Ask Questions

MultiTASC: A Multi-Tenancy-Aware Scheduler for Cascaded DNN Inference at the Consumer Edge

Jun 22, 2023

Sokratis Nikolaidis, Stylianos I. Venieris, Iakovos S. Venieris

Abstract:Cascade systems comprise a two-model sequence, with a lightweight model processing all samples and a heavier, higher-accuracy model conditionally refining harder samples to improve accuracy. By placing the light model on the device side and the heavy model on a server, model cascades constitute a widely used distributed inference approach. With the rapid expansion of intelligent indoor environments, such as smart homes, the new setting of Multi-Device Cascade is emerging where multiple and diverse devices are to simultaneously use a shared heavy model on the same server, typically located within or close to the consumer environment. This work presents MultiTASC, a multi-tenancy-aware scheduler that adaptively controls the forwarding decision functions of the devices in order to maximize the system throughput, while sustaining high accuracy and low latency. By explicitly considering device heterogeneity, our scheduler improves the latency service-level objective (SLO) satisfaction rate by 20-25 percentage points (pp) over state-of-the-art cascade methods in highly heterogeneous setups, while serving over 40 devices, showcasing its scalability.

* Accepted at 28th IEEE Symposium on Computers and Communications (ISCC), 2023

Via

Access Paper or Ask Questions

Exploring the Performance and Efficiency of Transformer Models for NLP on Mobile Devices

Jun 20, 2023

Ioannis Panopoulos, Sokratis Nikolaidis, Stylianos I. Venieris, Iakovos S. Venieris

Abstract:Deep learning (DL) is characterised by its dynamic nature, with new deep neural network (DNN) architectures and approaches emerging every few years, driving the field's advancement. At the same time, the ever-increasing use of mobile devices (MDs) has resulted in a surge of DNN-based mobile applications. Although traditional architectures, like CNNs and RNNs, have been successfully integrated into MDs, this is not the case for Transformers, a relatively new model family that has achieved new levels of accuracy across AI tasks, but poses significant computational challenges. In this work, we aim to make steps towards bridging this gap by examining the current state of Transformers' on-device execution. To this end, we construct a benchmark of representative models and thoroughly evaluate their performance across MDs with different computational capabilities. Our experimental results show that Transformers are not accelerator-friendly and indicate the need for software and hardware optimisations to achieve efficient deployment.

* Accepted at the 3rd IEEE International Workshop on Distributed Intelligent Systems (DistInSys), 2023

Via

Access Paper or Ask Questions

How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures

Jun 21, 2021

Stylianos I. Venieris, Ioannis Panopoulos, Ilias Leontiadis, Iakovos S. Venieris

Figure 1 for How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures

Figure 2 for How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures

Figure 3 for How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures

Figure 4 for How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures

Abstract:The unprecedented performance of deep neural networks (DNNs) has led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition. Nevertheless, deploying such AI models across commodity devices faces significant challenges: large computational cost, multiple performance objectives, hardware heterogeneity and a common need for high accuracy, together pose critical problems to the deployment of DNNs across the various embedded and mobile devices in the wild. As such, we have yet to witness the mainstream usage of state-of-the-art deep learning algorithms across consumer devices. In this paper, we provide preliminary answers to this potentially game-changing question by presenting an array of design techniques for efficient AI systems. We start by examining the major roadblocks when targeting both programmable processors and custom accelerators. Then, we present diverse methods for achieving real-time performance following a cross-stack approach. These span model-, system- and hardware-level techniques, and their combination. Our findings provide illustrative examples of AI systems that do not overburden mobile hardware, while also indicating how they can improve inference accuracy. Moreover, we showcase how custom ASIC- and FPGA-based accelerators can be an enabling factor for next-generation AI applications, such as multi-DNN systems. Collectively, these results highlight the critical need for further exploration as to how the various cross-stack solutions can be best combined in order to bring the latest advances in deep learning close to users, in a robust and efficient manner.

* Invited paper at the 32nd IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), 2021

Via

Access Paper or Ask Questions

OODIn: An Optimised On-Device Inference Framework for Heterogeneous Mobile Devices

Jun 08, 2021

Stylianos I. Venieris, Ioannis Panopoulos, Iakovos S. Venieris

Figure 1 for OODIn: An Optimised On-Device Inference Framework for Heterogeneous Mobile Devices

Figure 2 for OODIn: An Optimised On-Device Inference Framework for Heterogeneous Mobile Devices

Figure 3 for OODIn: An Optimised On-Device Inference Framework for Heterogeneous Mobile Devices

Figure 4 for OODIn: An Optimised On-Device Inference Framework for Heterogeneous Mobile Devices

Abstract:Radical progress in the field of deep learning (DL) has led to unprecedented accuracy in diverse inference tasks. As such, deploying DL models across mobile platforms is vital to enable the development and broad availability of the next-generation intelligent apps. Nevertheless, the wide and optimised deployment of DL models is currently hindered by the vast system heterogeneity of mobile devices, the varying computational cost of different DL models and the variability of performance needs across DL applications. This paper proposes OODIn, a framework for the optimised deployment of DL apps across heterogeneous mobile devices. OODIn comprises a novel DL-specific software architecture together with an analytical framework for modelling DL applications that: (1) counteract the variability in device resources and DL models by means of a highly parametrised multi-layer design; and (2) perform a principled optimisation of both model- and system-level parameters through a multi-objective formulation, designed for DL inference apps, in order to adapt the deployment to the user-specified performance requirements and device capabilities. Quantitative evaluation shows that the proposed framework consistently outperforms status-quo designs across heterogeneous devices and delivers up to 4.3x and 3.5x performance gain over highly optimised platform- and model-aware designs respectively, while effectively adapting execution to dynamic changes in resource availability.

* Accepted at the 7th IEEE International Conference on Smart Computing (SMARTCOMP), 2021

Via

Access Paper or Ask Questions

PerceptionNet: A Deep Convolutional Neural Network for Late Sensor Fusion

Nov 01, 2018

Panagiotis Kasnesis, Charalampos Z. Patrikakis, Iakovos S. Venieris

Figure 1 for PerceptionNet: A Deep Convolutional Neural Network for Late Sensor Fusion

Figure 2 for PerceptionNet: A Deep Convolutional Neural Network for Late Sensor Fusion

Figure 3 for PerceptionNet: A Deep Convolutional Neural Network for Late Sensor Fusion

Figure 4 for PerceptionNet: A Deep Convolutional Neural Network for Late Sensor Fusion

Abstract:Human Activity Recognition (HAR) based on motion sensors has drawn a lot of attention over the last few years, since perceiving the human status enables context-aware applications to adapt their services on users' needs. However, motion sensor fusion and feature extraction have not reached their full potentials, remaining still an open issue. In this paper, we introduce PerceptionNet, a deep Convolutional Neural Network (CNN) that applies a late 2D convolution to multimodal time-series sensor data, in order to extract automatically efficient features for HAR. We evaluate our approach on two public available HAR datasets to demonstrate that the proposed model fuses effectively multimodal sensors and improves the performance of HAR. In particular, PerceptionNet surpasses the performance of state-of-the-art HAR methods based on: (i) features extracted from humans, (ii) deep CNNs exploiting early fusion approaches, and (iii) Long Short-Term Memory (LSTM), by an average accuracy of more than 3%.

* This article has been accepted for publication in the proceedings of Intelligent Systems Conference (IntelliSys) 2018

Via

Access Paper or Ask Questions