Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zdenek Vasicek

Arbitrary Precision Printed Ternary Neural Networks with Holistic Evolutionary Approximation

Aug 27, 2025

Vojtech Mrazek, Konstantinos Balaskas, Paula Carolina Lozano Duarte, Zdenek Vasicek, Mehdi B. Tahoori, Georgios Zervakis

Abstract:Printed electronics offer a promising alternative for applications beyond silicon-based systems, requiring properties like flexibility, stretchability, conformality, and ultra-low fabrication costs. Despite the large feature sizes in printed electronics, printed neural networks have attracted attention for meeting target application requirements, though realizing complex circuits remains challenging. This work bridges the gap between classification accuracy and area efficiency in printed neural networks, covering the entire processing-near-sensor system design and co-optimization from the analog-to-digital interface-a major area and power bottleneck-to the digital classifier. We propose an automated framework for designing printed Ternary Neural Networks with arbitrary input precision, utilizing multi-objective optimization and holistic approximation. Our circuits outperform existing approximate printed neural networks by 17x in area and 59x in power on average, being the first to enable printed-battery-powered operation with under 5% accuracy loss while accounting for analog-to-digital interfacing costs.

* Accepted at IEEE Transactions on Circuits and Systems for Artificial Intelligence

Via

Access Paper or Ask Questions

TinyverseGP: Towards a Modular Cross-domain Benchmarking Framework for Genetic Programming

Apr 14, 2025

Roman Kalkreuth, Fabricio Olivetti de França, Julian Dierkes, Marie Anastacio, Anja Jankovic, Zdenek Vasicek, Holger Hoos

Abstract:Over the years, genetic programming (GP) has evolved, with many proposed variations, especially in how they represent a solution. Being essentially a program synthesis algorithm, it is capable of tackling multiple problem domains. Current benchmarking initiatives are fragmented, as the different representations are not compared with each other and their performance is not measured across the different domains. In this work, we propose a unified framework, dubbed TinyverseGP (inspired by tinyGP), which provides support to multiple representations and problem domains, including symbolic regression, logic synthesis and policy search.

* GECCO'25 Companion: Genetic and Evolutionary Computation Conference Companion, July 14-18, 2025, Malaga, Spain
* Accepted for presentation as a poster at the Genetic and Evolutionary Computation Conference (GECCO) and will appear in the GECCO'25 companion. GECCO'25 will be held July 14-18, 2025 in M\'alaga, Spain

Via

Access Paper or Ask Questions

Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators

Apr 08, 2024

Jan Klhufek, Miroslav Safar, Vojtech Mrazek, Zdenek Vasicek, Lukas Sekanina

Abstract:Energy efficiency and memory footprint of a convolutional neural network (CNN) implemented on a CNN inference accelerator depend on many factors, including a weight quantization strategy (i.e., data types and bit-widths) and mapping (i.e., placement and scheduling of DNN elementary operations on hardware units of the accelerator). We show that enabling rich mixed quantization schemes during the implementation can open a previously hidden space of mappings that utilize the hardware resources more effectively. CNNs utilizing quantized weights and activations and suitable mappings can significantly improve trade-offs among the accuracy, energy, and memory requirements compared to less carefully optimized CNN implementations. To find, analyze, and exploit these mappings, we: (i) extend a general-purpose state-of-the-art mapping tool (Timeloop) to support mixed quantization, which is not currently available; (ii) propose an efficient multi-objective optimization algorithm to find the most suitable bit-widths and mapping for each DNN layer executed on the accelerator; and (iii) conduct a detailed experimental evaluation to validate the proposed method. On two CNNs (MobileNetV1 and MobileNetV2) and two accelerators (Eyeriss and Simba) we show that for a given quality metric (such as the accuracy on ImageNet), energy savings are up to 37% without any accuracy drop.

* To appear at the 2024 27th International Symposium on Design & Diagnostics of Electronic Circuits & Systems (DDECS)

Via

Access Paper or Ask Questions

Semantically-Oriented Mutation Operator in Cartesian Genetic Programming for Evolutionary Circuit Design

Apr 23, 2020

David Hodan, Vojtech Mrazek, Zdenek Vasicek

Figure 1 for Semantically-Oriented Mutation Operator in Cartesian Genetic Programming for Evolutionary Circuit Design

Figure 2 for Semantically-Oriented Mutation Operator in Cartesian Genetic Programming for Evolutionary Circuit Design

Figure 3 for Semantically-Oriented Mutation Operator in Cartesian Genetic Programming for Evolutionary Circuit Design

Figure 4 for Semantically-Oriented Mutation Operator in Cartesian Genetic Programming for Evolutionary Circuit Design

Abstract:Despite many successful applications, Cartesian Genetic Programming (CGP) suffers from limited scalability, especially when used for evolutionary circuit design. Considering the multiplier design problem, for example, the 5x5-bit multiplier represents the most complex circuit evolved from a randomly generated initial population. The efficiency of CGP highly depends on the performance of the point mutation operator, however, this operator is purely stochastic. This contrasts with the recent developments in Genetic Programming (GP), where advanced informed approaches such as semantic-aware operators are incorporated to improve the search space exploration capability of GP. In this paper, we propose a semantically-oriented mutation operator (SOMO) suitable for the evolutionary design of combinational circuits. SOMO uses semantics to determine the best value for each mutated gene. Compared to the common CGP and its variants as well as the recent versions of Semantic GP, the proposed method converges on common Boolean benchmarks substantially faster while keeping the phenotype size relatively small. The successfully evolved instances presented in this paper include 10-bit parity, 10+10-bit adder and 5x5-bit multiplier. The most complex circuits were evolved in less than one hour with a single-thread implementation running on a common CPU.

* Accepted for Genetic and Evolutionary Computation Conference (GECCO '20), July 8--12, 2020, Canc\'un, Mexico

Via

Access Paper or Ask Questions

Adaptive Verifiability-Driven Strategy for Evolutionary Approximation of Arithmetic Circuits

Mar 05, 2020

Milan Ceska, Jiri Matyas, Vojtech Mrazek, Lukas Sekanina, Zdenek Vasicek, Tomas Vojnar

Figure 1 for Adaptive Verifiability-Driven Strategy for Evolutionary Approximation of Arithmetic Circuits

Figure 2 for Adaptive Verifiability-Driven Strategy for Evolutionary Approximation of Arithmetic Circuits

Figure 3 for Adaptive Verifiability-Driven Strategy for Evolutionary Approximation of Arithmetic Circuits

Figure 4 for Adaptive Verifiability-Driven Strategy for Evolutionary Approximation of Arithmetic Circuits

Abstract:We present a novel approach for designing complex approximate arithmetic circuits that trade correctness for power consumption and play important role in many energy-aware applications. Our approach integrates in a unique way formal methods providing formal guarantees on the approximation error into an evolutionary circuit optimisation algorithm. The key idea is to employ a novel adaptive search strategy that drives the evolution towards promptly verifiable approximate circuits. As demonstrated in an extensive experimental evaluation including several structurally different arithmetic circuits and target precisions, the search strategy provides superior scalability and versatility with respect to various approximation scenarios. Our approach significantly improves capabilities of the existing methods and paves a way towards an automated design process of provably-correct circuit approximations.

Via

Access Paper or Ask Questions

TFApprox: Towards a Fast Emulation of DNN Approximate Hardware Accelerators on GPU

Feb 21, 2020

Filip Vaverka, Vojtech Mrazek, Zdenek Vasicek, Lukas Sekanina

Figure 1 for TFApprox: Towards a Fast Emulation of DNN Approximate Hardware Accelerators on GPU

Figure 2 for TFApprox: Towards a Fast Emulation of DNN Approximate Hardware Accelerators on GPU

Figure 3 for TFApprox: Towards a Fast Emulation of DNN Approximate Hardware Accelerators on GPU

Abstract:Energy efficiency of hardware accelerators of deep neural networks (DNN) can be improved by introducing approximate arithmetic circuits. In order to quantify the error introduced by using these circuits and avoid the expensive hardware prototyping, a software emulator of the DNN accelerator is usually executed on CPU or GPU. However, this emulation is typically two or three orders of magnitude slower than a software DNN implementation running on CPU or GPU and operating with standard floating point arithmetic instructions and common DNN libraries. The reason is that there is no hardware support for approximate arithmetic operations on common CPUs and GPUs and these operations have to be expensively emulated. In order to address this issue, we propose an efficient emulation method for approximate circuits utilized in a given DNN accelerator which is emulated on GPU. All relevant approximate circuits are implemented as look-up tables and accessed through a texture memory mechanism of CUDA capable GPUs. We exploit the fact that the texture memory is optimized for irregular read-only access and in some GPU architectures is even implemented as a dedicated cache. This technique allowed us to reduce the inference time of the emulated DNN accelerator approximately 200 times with respect to an optimized CPU version on complex DNNs such as ResNet. The proposed approach extends the TensorFlow library and is available online at https://github.com/ehw-fit/tf-approximate.

* To appear at the 23rd Design, Automation and Test in Europe (DATE 2020). Grenoble, France

Via

Access Paper or Ask Questions

ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining

Jul 25, 2019

Vojtech Mrazek, Zdenek Vasicek, Lukas Sekanina, Muhammad Abdullah Hanif, Muhammad Shafique

Figure 1 for ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining

Figure 2 for ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining

Figure 3 for ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining

Figure 4 for ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining

Abstract:The state-of-the-art approaches employ approximate computing to reduce the energy consumption of DNN hardware. Approximate DNNs then require extensive retraining afterwards to recover from the accuracy loss caused by the use of approximate operations. However, retraining of complex DNNs does not scale well. In this paper, we demonstrate that efficient approximations can be introduced into the computational path of DNN accelerators while retraining can completely be avoided. ALWANN provides highly optimized implementations of DNNs for custom low-power accelerators in which the number of computing units is lower than the number of DNN layers. First, a fully trained DNN is converted to operate with 8-bit weights and 8-bit multipliers in convolutional layers. A suitable approximate multiplier is then selected for each computing element from a library of approximate multipliers in such a way that (i) one approximate multiplier serves several layers, and (ii) the overall classification error and energy consumption are minimized. The optimizations including the multiplier selection problem are solved by means of a multiobjective optimization NSGA-II algorithm. In order to completely avoid the computationally expensive retraining of DNN, which is usually employed to improve the classification accuracy, we propose a simple weight updating scheme that compensates the inaccuracy introduced by employing approximate multipliers. The proposed approach is evaluated for two architectures of DNN accelerators with approximate multipliers from the open-source "EvoApprox" library. We report that the proposed approach saves 30% of energy needed for multiplication in convolutional layers of ResNet-50 while the accuracy is degraded by only 0.6%. The proposed technique and approximate layers are available as an open-source extension of TensorFlow at https://github.com/ehw-fit/tf-approximate.

* Accepted for 2019 IEEE/ACM International Conference On Computer-Aided Design (ICCAD'19)

Via

Access Paper or Ask Questions

autoAx: An Automatic Design Space Exploration and Circuit Building Methodology utilizing Libraries of Approximate Components

Apr 01, 2019

Vojtech Mrazek, Muhammad Abdullah Hanif, Zdenek Vasicek, Lukas Sekanina, Muhammad Shafique

Figure 1 for autoAx: An Automatic Design Space Exploration and Circuit Building Methodology utilizing Libraries of Approximate Components

Figure 2 for autoAx: An Automatic Design Space Exploration and Circuit Building Methodology utilizing Libraries of Approximate Components

Figure 3 for autoAx: An Automatic Design Space Exploration and Circuit Building Methodology utilizing Libraries of Approximate Components

Figure 4 for autoAx: An Automatic Design Space Exploration and Circuit Building Methodology utilizing Libraries of Approximate Components

Abstract:Approximate computing is an emerging paradigm for developing highly energy-efficient computing systems such as various accelerators. In the literature, many libraries of elementary approximate circuits have already been proposed to simplify the design process of approximate accelerators. Because these libraries contain from tens to thousands of approximate implementations for a single arithmetic operation it is intractable to find an optimal combination of approximate circuits in the library even for an application consisting of a few operations. An open problem is "how to effectively combine circuits from these libraries to construct complex approximate accelerators". This paper proposes a novel methodology for searching, selecting and combining the most suitable approximate circuits from a set of available libraries to generate an approximate accelerator for a given application. To enable fast design space generation and exploration, the methodology utilizes machine learning techniques to create computational models estimating the overall quality of processing and hardware cost without performing full synthesis at the accelerator level. Using the methodology, we construct hundreds of approximate accelerators (for a Sobel edge detector) showing different but relevant tradeoffs between the quality of processing and hardware cost and identify a corresponding Pareto-frontier. Furthermore, when searching for approximate implementations of a generic Gaussian filter consisting of 17 arithmetic operations, the proposed approach allows us to identify approximately $10^3$ highly important implementations from $10^{23}$ possible solutions in a few hours, while the exhaustive search would take four months on a high-end processor.

* Accepted for publication at the Design Automation Conference 2019 (DAC'19), Las Vegas, Nevada, USA

Via

Access Paper or Ask Questions

Automated Circuit Approximation Method Driven by Data Distribution

Mar 11, 2019

Zdenek Vasicek, Vojtech Mrazek, Lukas Sekanina

Figure 1 for Automated Circuit Approximation Method Driven by Data Distribution

Figure 2 for Automated Circuit Approximation Method Driven by Data Distribution

Figure 3 for Automated Circuit Approximation Method Driven by Data Distribution

Figure 4 for Automated Circuit Approximation Method Driven by Data Distribution

Abstract:We propose an application-tailored data-driven fully automated method for functional approximation of combinational circuits. We demonstrate how an application-level error metric such as the classification accuracy can be translated to a component-level error metric needed for an efficient and fast search in the space of approximate low-level components that are used in the application. This is possible by employing a weighted mean error distance (WMED) metric for steering the circuit approximation process which is conducted by means of genetic programming. WMED introduces a set of weights (calculated from the data distribution measured on a selected signal in a given application) determining the importance of each input vector for the approximation process. The method is evaluated using synthetic benchmarks and application-specific approximate MAC (multiply-and-accumulate) units that are designed to provide the best trade-offs between the classification accuracy and power consumption of two image classifiers based on neural networks.

* Accepted for publication at Design, Automation and Test in Europe (DATE 2019). Florence, Italy

Via

Access Paper or Ask Questions