Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Georgios Zervakis

Carbon-Efficient 3D DNN Acceleration: Optimizing Performance and Sustainability

Apr 14, 2025

Aikaterini Maria Panteleaki, Konstantinos Balaskas, Georgios Zervakis, Hussam Amrouch, Iraklis Anagnostopoulos

Abstract:As Deep Neural Networks (DNNs) continue to drive advancements in artificial intelligence, the design of hardware accelerators faces growing concerns over embodied carbon footprint due to complex fabrication processes. 3D integration improves performance but introduces sustainability challenges, making carbon-aware optimization essential. In this work, we propose a carbon-efficient design methodology for 3D DNN accelerators, leveraging approximate computing and genetic algorithm-based design space exploration to optimize Carbon Delay Product (CDP). By integrating area-efficient approximate multipliers into Multiply-Accumulate (MAC) units, our approach effectively reduces silicon area and fabrication overhead while maintaining high computational accuracy. Experimental evaluations across three technology nodes (45nm, 14nm, and 7nm) show that our method reduces embodied carbon by up to 30% with negligible accuracy drop.

* Submitted in ISVLSI 2025

Via

Access Paper or Ask Questions

Compact Yet Highly Accurate Printed Classifiers Using Sequential Support Vector Machine Circuits

Feb 03, 2025

Ilias Sertaridis, Spyridon Besias, Florentia Afentaki, Konstantinos Balaskas, Georgios Zervakis

Figure 1 for Compact Yet Highly Accurate Printed Classifiers Using Sequential Support Vector Machine Circuits

Figure 2 for Compact Yet Highly Accurate Printed Classifiers Using Sequential Support Vector Machine Circuits

Figure 3 for Compact Yet Highly Accurate Printed Classifiers Using Sequential Support Vector Machine Circuits

Figure 4 for Compact Yet Highly Accurate Printed Classifiers Using Sequential Support Vector Machine Circuits

Abstract:Printed Electronics (PE) technology has emerged as a promising alternative to silicon-based computing. It offers attractive properties such as on-demand ultra-low-cost fabrication, mechanical flexibility, and conformality. However, PE are governed by large feature sizes, prohibiting the realization of complex printed Machine Learning (ML) classifiers. Leveraging PE's ultra-low non-recurring engineering and fabrication costs, designers can fully customize hardware to a specific ML model and dataset, significantly reducing circuit complexity. Despite significant advancements, state-of-the-art solutions achieve area efficiency at the expense of considerable accuracy loss. Our work mitigates this by designing area- and power-efficient printed ML classifiers with little to no accuracy degradation. Specifically, we introduce the first sequential Support Vector Machine (SVM) classifiers, exploiting the hardware efficiency of bespoke control and storage units and a single Multiply-Accumulate compute engine. Our SVMs yield on average 6x lower area and 4.6% higher accuracy compared to the printed state of the art.

* Accepted at the 2025 IEEE International Symposium on Circuits and Systems (ISCAS), May 25-28 2025, London, UK

Via

Access Paper or Ask Questions

Late Breaking Results: Energy-Efficient Printed Machine Learning Classifiers with Sequential SVMs

Jan 28, 2025

Spyridon Besias, Ilias Sertaridis, Florentia Afentaki, Konstantinos Balaskas, Georgios Zervakis

Figure 1 for Late Breaking Results: Energy-Efficient Printed Machine Learning Classifiers with Sequential SVMs

Figure 2 for Late Breaking Results: Energy-Efficient Printed Machine Learning Classifiers with Sequential SVMs

Abstract:Printed Electronics (PE) provide a mechanically flexible and cost-effective solution for machine learning (ML) circuits, compared to silicon-based technologies. However, due to large feature sizes, printed classifiers are limited by high power, area, and energy overheads, which restricts the realization of battery-powered systems. In this work, we design sequential printed bespoke Support Vector Machine (SVM) circuits that adhere to the power constraints of existing printed batteries while minimizing energy consumption, thereby boosting battery life. Our results show 6.5x energy savings while maintaining higher accuracy compared to the state of the art.

* Accepted at the Design, Automation and Test in Europe Conference (DATE'25), March 31 - April 2, 2025

Via

Access Paper or Ask Questions

Leveraging Highly Approximated Multipliers in DNN Inference

Dec 21, 2024

Georgios Zervakis, Fabio Frustaci, Ourania Spantidi, Iraklis Anagnostopoulos, Hussam Amrouch, Jörg Henkel

Figure 1 for Leveraging Highly Approximated Multipliers in DNN Inference

Figure 2 for Leveraging Highly Approximated Multipliers in DNN Inference

Figure 3 for Leveraging Highly Approximated Multipliers in DNN Inference

Figure 4 for Leveraging Highly Approximated Multipliers in DNN Inference

Abstract:In this work, we present a control variate approximation technique that enables the exploitation of highly approximate multipliers in Deep Neural Network (DNN) accelerators. Our approach does not require retraining and significantly decreases the induced error due to approximate multiplications, improving the overall inference accuracy. As a result, our approach enables satisfying tight accuracy loss constraints while boosting the power savings. Our experimental evaluation, across six different DNNs and several approximate multipliers, demonstrates the versatility of our approach and shows that compared to the accurate design, our control variate approximation achieves the same performance, 45% power reduction, and less than 1% average accuracy loss. Compared to the corresponding approximate designs without using our technique, our approach improves the accuracy by 1.9x on average.

Via

Access Paper or Ask Questions

Reducing ADC Front-end Costs During Training of On-sensor Printed Multilayer Perceptrons

Nov 14, 2024

Florentia Afentaki, Paula Carolina Lozano Duarte, Georgios Zervakis, Mehdi B. Tahoori

Figure 1 for Reducing ADC Front-end Costs During Training of On-sensor Printed Multilayer Perceptrons

Figure 2 for Reducing ADC Front-end Costs During Training of On-sensor Printed Multilayer Perceptrons

Figure 3 for Reducing ADC Front-end Costs During Training of On-sensor Printed Multilayer Perceptrons

Figure 4 for Reducing ADC Front-end Costs During Training of On-sensor Printed Multilayer Perceptrons

Abstract:Printed electronics technology offers a cost-effectiveand fully-customizable solution to computational needs beyondthe capabilities of traditional silicon technologies, offering ad-vantages such as on-demand manufacturing and conformal, low-cost hardware. However, the low-resolution fabrication of printedelectronics, which results in large feature sizes, poses a challengefor integrating complex designs like those of machine learn-ing (ML) classification systems. Current literature optimizes onlythe Multilayer Perceptron (MLP) circuit within the classificationsystem, while the cost of analog-to-digital converters (ADCs)is overlooked. Printed applications frequently require on-sensorprocessing, yet while the digital classifier has been extensivelyoptimized, the analog-to-digital interfacing, specifically the ADCs,dominates the total area and energy consumption. In this work,we target digital printed MLP classifiers and we propose thedesign of customized ADCs per MLP's input which involvesminimizing the distinct represented numbers for each input,simplifying thus the ADC's circuitry. Incorporating this ADCoptimization in the MLP training, enables eliminating ADC levelsand the respective comparators, while still maintaining highclassification accuracy. Our approach achieves 11.2x lower ADCarea for less than 5% accuracy drop across varying MLPs.

* This article is accepted for publication in IEEE Embedded Systems Letters

Via

Access Paper or Ask Questions

Design and In-training Optimization of Binary Search ADC for Flexible Classifiers

Oct 02, 2024

Paula Carolina Lozano Duarte, Florentia Afentaki, Georgios Zervakis, Mehdi B. Tahoori

Figure 1 for Design and In-training Optimization of Binary Search ADC for Flexible Classifiers

Figure 2 for Design and In-training Optimization of Binary Search ADC for Flexible Classifiers

Figure 3 for Design and In-training Optimization of Binary Search ADC for Flexible Classifiers

Figure 4 for Design and In-training Optimization of Binary Search ADC for Flexible Classifiers

Abstract:Flexible Electronics (FE) offer distinct advantages, including mechanical flexibility and low process temperatures, enabling extremely low-cost production. To address the demands of applications such as smart sensors and wearables, flexible devices must be small and operate at low supply voltages. Additionally, target applications often require classifiers to operate directly on analog sensory input, necessitating the use of Analog to Digital Converters (ADCs) to process the sensory data. However, ADCs present serious challenges, particularly in terms of high area and power consumption, especially when considering stringent area and energy budget. In this work, we target common classifiers in this domain such as MLPs and SVMs and present a holistic approach to mitigate the elevated overhead of analog to digital interfacing in FE. First, we propose a novel design for Binary Search ADC that reduces area overhead 2X compared with the state-of-the-art Binary design and up to 5.4X compared with Flash ADC. Next, we present an in-training ADC optimization in which we keep the bare-minimum representations required and simplifying ADCs by removing unnecessary components. Our in-training optimization further reduces on average the area in terms of transistor count of the required ADCs by 5X for less than 1% accuracy loss.

* Accepted for publication at the 30th Asia and South Pacific Design Automation Conference (ASPDAC '25). doi: https://doi.org/10.1145/3658617.3697715

Via

Access Paper or Ask Questions

TransAxx: Efficient Transformers with Approximate Computing

Feb 12, 2024

Dimitrios Danopoulos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel

Abstract:Vision Transformer (ViT) models which were recently introduced by the transformer architecture have shown to be very competitive and often become a popular alternative to Convolutional Neural Networks (CNNs). However, the high computational requirements of these models limit their practical applicability especially on low-power devices. Current state-of-the-art employs approximate multipliers to address the highly increased compute demands of DNN accelerators but no prior research has explored their use on ViT models. In this work we propose TransAxx, a framework based on the popular PyTorch library that enables fast inherent support for approximate arithmetic to seamlessly evaluate the impact of approximate computing on DNNs such as ViT models. Using TransAxx we analyze the sensitivity of transformer models on the ImageNet dataset to approximate multiplications and perform approximate-aware finetuning to regain accuracy. Furthermore, we propose a methodology to generate approximate accelerators for ViT models. Our approach uses a Monte Carlo Tree Search (MCTS) algorithm to efficiently search the space of possible configurations using a hardware-driven hand-crafted policy. Our evaluation demonstrates the efficacy of our methodology in achieving significant trade-offs between accuracy and power, resulting in substantial gains without compromising on performance.

Via

Access Paper or Ask Questions

Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs

Feb 05, 2024

Florentia Afentaki, Michael Hefenbrock, Georgios Zervakis, Mehdi B. Tahoori

Figure 1 for Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs

Figure 2 for Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs

Figure 3 for Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs

Figure 4 for Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs

Abstract:Printed Electronics (PE) stands out as a promisingtechnology for widespread computing due to its distinct attributes, such as low costs and flexible manufacturing. Unlike traditional silicon-based technologies, PE enables stretchable, conformal,and non-toxic hardware. However, PE are constrained by larger feature sizes, making it challenging to implement complex circuits such as machine learning (ML) classifiers. Approximate computing has been proven to reduce the hardware cost of ML circuits such as Multilayer Perceptrons (MLPs). In this paper, we maximize the benefits of approximate computing by integrating hardware approximation into the MLP training process. Due to the discrete nature of hardware approximation, we propose and implement a genetic-based, approximate, hardware-aware training approach specifically designed for printed MLPs. For a 5% accuracy loss, our MLPs achieve over 5x area and power reduction compared to the baseline while outperforming state of-the-art approximate and stochastic printed MLPs.

* Accepted for publication at the 27th Design, Automation and Test in Europe Conference (DATE'24), Mar 25-27 2024, Valencia, Spain

Via

Access Paper or Ask Questions

Bespoke Approximation of Multiplication-Accumulation and Activation Targeting Printed Multilayer Perceptrons

Dec 29, 2023

Florentia Afentaki, Gurol Saglam, Argyris Kokkinis, Kostas Siozios, Georgios Zervakis, Mehdi B Tahoori

Figure 1 for Bespoke Approximation of Multiplication-Accumulation and Activation Targeting Printed Multilayer Perceptrons

Figure 2 for Bespoke Approximation of Multiplication-Accumulation and Activation Targeting Printed Multilayer Perceptrons

Figure 3 for Bespoke Approximation of Multiplication-Accumulation and Activation Targeting Printed Multilayer Perceptrons

Figure 4 for Bespoke Approximation of Multiplication-Accumulation and Activation Targeting Printed Multilayer Perceptrons

Abstract:Printed Electronics (PE) feature distinct and remarkable characteristics that make them a prominent technology for achieving true ubiquitous computing. This is particularly relevant in application domains that require conformal and ultra-low cost solutions, which have experienced limited penetration of computing until now. Unlike silicon-based technologies, PE offer unparalleled features such as non-recurring engineering costs, ultra-low manufacturing cost, and on-demand fabrication of conformal, flexible, non-toxic, and stretchable hardware. However, PE face certain limitations due to their large feature sizes, that impede the realization of complex circuits, such as machine learning classifiers. In this work, we address these limitations by leveraging the principles of Approximate Computing and Bespoke (fully-customized) design. We propose an automated framework for designing ultra-low power Multilayer Perceptron (MLP) classifiers which employs, for the first time, a holistic approach to approximate all functions of the MLP's neurons: multiplication, accumulation, and activation. Through comprehensive evaluation across various MLPs of varying size, our framework demonstrates the ability to enable battery-powered operation of even the most intricate MLP architecture examined, significantly surpassing the current state of the art.

* Accepted for publication at the IEEE/ACM International Conference on Computer Aided Design (ICCAD) 2023, San Francisco, USA

Via

Access Paper or Ask Questions

Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Dec 23, 2023

Konstantinos Balaskas, Andreas Karatzas, Christos Sad, Kostas Siozios, Iraklis Anagnostopoulos, Georgios Zervakis, Jörg Henkel

Figure 1 for Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Figure 2 for Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Figure 3 for Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Figure 4 for Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Abstract:Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for running sophisticated DNN-based services on resource constrained embedded devices. In this paper, we target energy-efficient inference on embedded DNN accelerators. To that end, we propose an automated framework to compress DNNs in a hardware-aware manner by jointly employing pruning and quantization. We explore, for the first time, per-layer fine- and coarse-grained pruning, in the same DNN architecture, in addition to low bit-width mixed-precision quantization for weights and activations. Reinforcement Learning (RL) is used to explore the associated design space and identify the pruning-quantization configuration so that the energy consumption is minimized whilst the prediction accuracy loss is retained at acceptable levels. Using our novel composite RL agent we are able to extract energy-efficient solutions without requiring retraining and/or fine tuning. Our extensive experimental evaluation over widely used DNNs and the CIFAR-10/100 and ImageNet datasets demonstrates that our framework achieves $39\%$ average energy reduction for $1.7\%$ average accuracy loss and outperforms significantly the state-of-the-art approaches.

* 14 pages, 9 figures

Via

Access Paper or Ask Questions