Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Iraklis Anagnostopoulos

Carbon-Efficient 3D DNN Acceleration: Optimizing Performance and Sustainability

Apr 14, 2025

Aikaterini Maria Panteleaki, Konstantinos Balaskas, Georgios Zervakis, Hussam Amrouch, Iraklis Anagnostopoulos

Abstract:As Deep Neural Networks (DNNs) continue to drive advancements in artificial intelligence, the design of hardware accelerators faces growing concerns over embodied carbon footprint due to complex fabrication processes. 3D integration improves performance but introduces sustainability challenges, making carbon-aware optimization essential. In this work, we propose a carbon-efficient design methodology for 3D DNN accelerators, leveraging approximate computing and genetic algorithm-based design space exploration to optimize Carbon Delay Product (CDP). By integrating area-efficient approximate multipliers into Multiply-Accumulate (MAC) units, our approach effectively reduces silicon area and fabrication overhead while maintaining high computational accuracy. Experimental evaluations across three technology nodes (45nm, 14nm, and 7nm) show that our method reduces embodied carbon by up to 30% with negligible accuracy drop.

* Submitted in ISVLSI 2025

Via

Access Paper or Ask Questions

Ecomap: Sustainability-Driven Optimization of Multi-Tenant DNN Execution on Edge Servers

Mar 06, 2025

Varatheepan Paramanayakam, Andreas Karatzas, Dimitrios Stamoulis, Iraklis Anagnostopoulos

Abstract:Edge computing systems struggle to efficiently manage multiple concurrent deep neural network (DNN) workloads while meeting strict latency requirements, minimizing power consumption, and maintaining environmental sustainability. This paper introduces Ecomap, a sustainability-driven framework that dynamically adjusts the maximum power threshold of edge devices based on real-time carbon intensity. Ecomap incorporates the innovative use of mixed-quality models, allowing it to dynamically replace computationally heavy DNNs with lighter alternatives when latency constraints are violated, ensuring service responsiveness with minimal accuracy loss. Additionally, it employs a transformer-based estimator to guide efficient workload mappings. Experimental results using NVIDIA Jetson AGX Xavier demonstrate that Ecomap reduces carbon emissions by an average of 30% and achieves a 25% lower carbon delay product (CDP) compared to state-of-the-art methods, while maintaining comparable or better latency and power efficiency.

* 12 pages, 9 figures, 3 tables

Via

Access Paper or Ask Questions

Multi-Agent Geospatial Copilots for Remote Sensing Workflows

Jan 27, 2025

Chaehong Lee, Varatheepan Paramanayakam, Andreas Karatzas, Yanan Jian, Michael Fore, Heming Liao, Fuxun Yu, Ruopu Li, Iraklis Anagnostopoulos, Dimitrios Stamoulis

Figure 1 for Multi-Agent Geospatial Copilots for Remote Sensing Workflows

Figure 2 for Multi-Agent Geospatial Copilots for Remote Sensing Workflows

Figure 3 for Multi-Agent Geospatial Copilots for Remote Sensing Workflows

Figure 4 for Multi-Agent Geospatial Copilots for Remote Sensing Workflows

Abstract:We present GeoLLM-Squad, a geospatial Copilot that introduces the novel multi-agent paradigm to remote sensing (RS) workflows. Unlike existing single-agent approaches that rely on monolithic large language models (LLM), GeoLLM-Squad separates agentic orchestration from geospatial task-solving, by delegating RS tasks to specialized sub-agents. Built on the open-source AutoGen and GeoLLM-Engine frameworks, our work enables the modular integration of diverse applications, spanning urban monitoring, forestry protection, climate analysis, and agriculture studies. Our results demonstrate that while single-agent systems struggle to scale with increasing RS task complexity, GeoLLM-Squad maintains robust performance, achieving a 17% improvement in agentic correctness over state-of-the-art baselines. Our findings highlight the potential of multi-agent AI in advancing RS workflows.

Via

Access Paper or Ask Questions

Leveraging Highly Approximated Multipliers in DNN Inference

Dec 21, 2024

Georgios Zervakis, Fabio Frustaci, Ourania Spantidi, Iraklis Anagnostopoulos, Hussam Amrouch, Jörg Henkel

Figure 1 for Leveraging Highly Approximated Multipliers in DNN Inference

Figure 2 for Leveraging Highly Approximated Multipliers in DNN Inference

Figure 3 for Leveraging Highly Approximated Multipliers in DNN Inference

Figure 4 for Leveraging Highly Approximated Multipliers in DNN Inference

Abstract:In this work, we present a control variate approximation technique that enables the exploitation of highly approximate multipliers in Deep Neural Network (DNN) accelerators. Our approach does not require retraining and significantly decreases the induced error due to approximate multiplications, improving the overall inference accuracy. As a result, our approach enables satisfying tight accuracy loss constraints while boosting the power savings. Our experimental evaluation, across six different DNNs and several approximate multipliers, demonstrates the versatility of our approach and shows that compared to the accurate design, our control variate approximation achieves the same performance, 45% power reduction, and less than 1% average accuracy loss. Compared to the corresponding approximate designs without using our technique, our approach improves the accuracy by 1.9x on average.

Via

Access Paper or Ask Questions

RankMap: Priority-Aware Multi-DNN Manager for Heterogeneous Embedded Devices

Nov 26, 2024

Andreas Karatzas, Dimitrios Stamoulis, Iraklis Anagnostopoulos

Figure 1 for RankMap: Priority-Aware Multi-DNN Manager for Heterogeneous Embedded Devices

Figure 2 for RankMap: Priority-Aware Multi-DNN Manager for Heterogeneous Embedded Devices

Figure 3 for RankMap: Priority-Aware Multi-DNN Manager for Heterogeneous Embedded Devices

Figure 4 for RankMap: Priority-Aware Multi-DNN Manager for Heterogeneous Embedded Devices

Abstract:Modern edge data centers simultaneously handle multiple Deep Neural Networks (DNNs), leading to significant challenges in workload management. Thus, current management systems must leverage the architectural heterogeneity of new embedded systems to efficiently handle multi-DNN workloads. This paper introduces RankMap, a priority-aware manager specifically designed for multi-DNN tasks on heterogeneous embedded devices. RankMap addresses the extensive solution space of multi-DNN mapping through stochastic space exploration combined with a performance estimator. Experimental results show that RankMap achieves x3.6 higher average throughput compared to existing methods, while preventing DNN starvation under heavy workloads and improving the prioritization of specified DNNs by x57.5.

* 8 pages, 10 figures, 1 table, Accepted for publication at the 28th Design Automation and Test in Europe Conference (DATE 2025), Best Paper Award Candidate

Via

Access Paper or Ask Questions

Less is More: Optimizing Function Calling for LLM Execution on Edge Devices

Nov 23, 2024

Varatheepan Paramanayakam, Andreas Karatzas, Iraklis Anagnostopoulos, Dimitrios Stamoulis

Abstract:The advanced function-calling capabilities of foundation models open up new possibilities for deploying agents to perform complex API tasks. However, managing large amounts of data and interacting with numerous APIs makes function calling hardware-intensive and costly, especially on edge devices. Current Large Language Models (LLMs) struggle with function calling at the edge because they cannot handle complex inputs or manage multiple tools effectively. This results in low task-completion accuracy, increased delays, and higher power consumption. In this work, we introduce Less-is-More, a novel fine-tuning-free function-calling scheme for dynamic tool selection. Our approach is based on the key insight that selectively reducing the number of tools available to LLMs significantly improves their function-calling performance, execution time, and power efficiency on edge devices. Experimental results with state-of-the-art LLMs on edge hardware show agentic success rate improvements, with execution time reduced by up to 70% and power consumption by up to 40%.

* Accepted at DATE 2025

Via

Access Paper or Ask Questions

LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data Caching

Jun 10, 2024

Simranjit Singh, Michael Fore, Andreas Karatzas, Chaehong Lee, Yanan Jian, Longfei Shangguan, Fuxun Yu, Iraklis Anagnostopoulos, Dimitrios Stamoulis

Figure 1 for LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data Caching

Figure 2 for LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data Caching

Figure 3 for LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data Caching

Abstract:As Large Language Models (LLMs) broaden their capabilities to manage thousands of API calls, they are confronted with complex data operations across vast datasets with significant overhead to the underlying system. In this work, we introduce LLM-dCache to optimize data accesses by treating cache operations as callable API functions exposed to the tool-augmented agent. We grant LLMs the autonomy to manage cache decisions via prompting, seamlessly integrating with existing function-calling mechanisms. Tested on an industry-scale massively parallel platform that spans hundreds of GPT endpoints and terabytes of imagery, our method improves Copilot times by an average of 1.24x across various LLMs and prompting techniques.

Via

Access Paper or Ask Questions

Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Dec 23, 2023

Konstantinos Balaskas, Andreas Karatzas, Christos Sad, Kostas Siozios, Iraklis Anagnostopoulos, Georgios Zervakis, Jörg Henkel

Figure 1 for Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Figure 2 for Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Figure 3 for Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Figure 4 for Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization

Abstract:Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while at the same time, there is a vast demand for running sophisticated DNN-based services on resource constrained embedded devices. In this paper, we target energy-efficient inference on embedded DNN accelerators. To that end, we propose an automated framework to compress DNNs in a hardware-aware manner by jointly employing pruning and quantization. We explore, for the first time, per-layer fine- and coarse-grained pruning, in the same DNN architecture, in addition to low bit-width mixed-precision quantization for weights and activations. Reinforcement Learning (RL) is used to explore the associated design space and identify the pruning-quantization configuration so that the energy consumption is minimized whilst the prediction accuracy loss is retained at acceptable levels. Using our novel composite RL agent we are able to extract energy-efficient solutions without requiring retraining and/or fine tuning. Our extensive experimental evaluation over widely used DNNs and the CIFAR-10/100 and ImageNet datasets demonstrates that our framework achieves $39\%$ average energy reduction for $1.7\%$ average accuracy loss and outperforms significantly the state-of-the-art approaches.

* 14 pages, 9 figures

Via

Access Paper or Ask Questions

OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload

Jul 06, 2023

Andreas Karatzas, Iraklis Anagnostopoulos

Figure 1 for OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload

Figure 2 for OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload

Figure 3 for OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload

Figure 4 for OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload

Abstract:Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accuracy properties. This has introduced application workloads that comprise of multiple DNN applications, raising new challenges regarding workload distribution. Equipped with a diverse set of accelerators, newer embedded system present architectural heterogeneity, which current run-time controllers are unable to fully utilize. To enable high throughput in multi-DNN workloads, such a controller is ought to explore hundreds of thousands of possible solutions to exploit the underlying heterogeneity. In this paper, we propose OmniBoost, a lightweight and extensible multi-DNN manager for heterogeneous embedded devices. We leverage stochastic space exploration and we combine it with a highly accurate performance estimator to observe a x4.6 average throughput boost compared to other state-of-the-art methods. The evaluation was performed on the HiKey970 development board.

Via

Access Paper or Ask Questions

Energy-efficient DNN Inference on Approximate Accelerators Through Formal Property Exploration

Jul 25, 2022

Ourania Spantidi, Georgios Zervakis, Iraklis Anagnostopoulos, Jörg Henkel

Figure 1 for Energy-efficient DNN Inference on Approximate Accelerators Through Formal Property Exploration

Figure 2 for Energy-efficient DNN Inference on Approximate Accelerators Through Formal Property Exploration

Figure 3 for Energy-efficient DNN Inference on Approximate Accelerators Through Formal Property Exploration

Figure 4 for Energy-efficient DNN Inference on Approximate Accelerators Through Formal Property Exploration

Abstract:Deep Neural Networks (DNNs) are being heavily utilized in modern applications and are putting energy-constraint devices to the test. To bypass high energy consumption issues, approximate computing has been employed in DNN accelerators to balance out the accuracy-energy reduction trade-off. However, the approximation-induced accuracy loss can be very high and drastically degrade the performance of the DNN. Therefore, there is a need for a fine-grain mechanism that would assign specific DNN operations to approximation in order to maintain acceptable DNN accuracy, while also achieving low energy consumption. In this paper, we present an automated framework for weight-to-approximation mapping enabling formal property exploration for approximate DNN accelerators. At the MAC unit level, our experimental evaluation surpassed already energy-efficient mappings by more than $\times2$ in terms of energy gains, while also supporting significantly more fine-grain control over the introduced approximation.

* Accepted for publication at the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES) 2022. Will appear as part of the ESWEEK-TCAD special issue

Via

Access Paper or Ask Questions