Picture for Stylianos I. Venieris

Stylianos I. Venieris

Progressive Mixed-Precision Decoding for Efficient LLM Inference

Add code
Oct 17, 2024
Figure 1 for Progressive Mixed-Precision Decoding for Efficient LLM Inference
Figure 2 for Progressive Mixed-Precision Decoding for Efficient LLM Inference
Figure 3 for Progressive Mixed-Precision Decoding for Efficient LLM Inference
Figure 4 for Progressive Mixed-Precision Decoding for Efficient LLM Inference
Viaarxiv icon

CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads

Add code
Sep 02, 2024
Figure 1 for CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads
Figure 2 for CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads
Figure 3 for CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads
Figure 4 for CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads
Viaarxiv icon

Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference

Add code
May 28, 2024
Figure 1 for Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
Figure 2 for Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
Figure 3 for Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
Figure 4 for Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
Viaarxiv icon

LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms

Add code
Nov 19, 2023
Viaarxiv icon

Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads

Add code
Oct 17, 2023
Viaarxiv icon

Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation

Add code
Jul 25, 2023
Figure 1 for Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation
Figure 2 for Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation
Figure 3 for Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation
Figure 4 for Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation
Viaarxiv icon

TinyTrain: Deep Neural Network Training at the Extreme Edge

Add code
Jul 19, 2023
Viaarxiv icon

MultiTASC: A Multi-Tenancy-Aware Scheduler for Cascaded DNN Inference at the Consumer Edge

Add code
Jun 22, 2023
Viaarxiv icon

Exploring the Performance and Efficiency of Transformer Models for NLP on Mobile Devices

Add code
Jun 20, 2023
Viaarxiv icon

NAWQ-SR: A Hybrid-Precision NPU Engine for Efficient On-Device Super-Resolution

Add code
Dec 15, 2022
Viaarxiv icon