Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fadi Kurdahi

ADDT -- A Digital Twin Framework for Proactive Safety Validation in Autonomous Driving Systems

Apr 13, 2025

Bo Yu, Chaoran Yuan, Zishen Wan, Jie Tang, Fadi Kurdahi, Shaoshan Liu

Abstract:Autonomous driving systems continue to face safety-critical failures, often triggered by rare and unpredictable corner cases that evade conventional testing. We present the Autonomous Driving Digital Twin (ADDT) framework, a high-fidelity simulation platform designed to proactively identify hidden faults, evaluate real-time performance, and validate safety before deployment. ADDT combines realistic digital models of driving environments, vehicle dynamics, sensor behavior, and fault conditions to enable scalable, scenario-rich stress-testing under diverse and adverse conditions. It supports adaptive exploration of edge cases using reinforcement-driven techniques, uncovering failure modes that physical road testing often misses. By shifting from reactive debugging to proactive simulation-driven validation, ADDT enables a more rigorous and transparent approach to autonomous vehicle safety engineering. To accelerate adoption and facilitate industry-wide safety improvements, the entire ADDT framework has been released as open-source software, providing developers with an accessible and extensible tool for comprehensive safety testing at scale.

Via

Access Paper or Ask Questions

SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors

Nov 26, 2024

Mariam Rakka, Jinhao Li, Guohao Dai, Ahmed Eltawil, Mohammed E. Fouda, Fadi Kurdahi

Figure 1 for SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors

Figure 2 for SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors

Figure 3 for SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors

Figure 4 for SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors

Abstract:Recent research efforts focus on reducing the computational and memory overheads of Large Language Models (LLMs) to make them feasible on resource-constrained devices. Despite advancements in compression techniques, non-linear operators like Softmax and Layernorm remain bottlenecks due to their sensitivity to quantization. We propose SoftmAP, a software-hardware co-design methodology that implements an integer-only low-precision Softmax using In-Memory Compute (IMC) hardware. Our method achieves up to three orders of magnitude improvement in the energy-delay product compared to A100 and RTX3090 GPUs, making LLMs more deployable without compromising performance.

* Accepted in DATE 2025

Via

Access Paper or Ask Questions

BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration

Nov 03, 2024

Mariam Rakka, Rachid Karami, Ahmed M. Eltawil, Mohammed E. Fouda, Fadi Kurdahi

Figure 1 for BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration

Figure 2 for BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration

Figure 3 for BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration

Figure 4 for BF-IMNA: A Bit Fluid In-Memory Neural Architecture for Neural Network Acceleration

Abstract:Mixed-precision quantization works Neural Networks (NNs) are gaining traction for their efficient realization on the hardware leading to higher throughput and lower energy. In-Memory Computing (IMC) accelerator architectures are offered as alternatives to traditional architectures relying on a data-centric computational paradigm, diminishing the memory wall problem, and scoring high throughput and energy efficiency. These accelerators can support static fixed-precision but are not flexible to support mixed-precision NNs. In this paper, we present BF-IMNA, a bit fluid IMC accelerator for end-to-end Convolutional NN (CNN) inference that is capable of static and dynamic mixed-precision without any hardware reconfiguration overhead at run-time. At the heart of BF-IMNA are Associative Processors (APs), which are bit-serial word-parallel Single Instruction, Multiple Data (SIMD)-like engines. We report the performance of end-to-end inference of ImageNet on AlexNet, VGG16, and ResNet50 on BF-IMNA for different technologies (eNVM and NVM), mixed-precision configurations, and supply voltages. To demonstrate bit fluidity, we implement HAWQ-V3's per-layer mixed-precision configurations for ResNet18 on BF-IMNA using different latency budgets, and results reveal a trade-off between accuracy and Energy-Delay Product (EDP): On one hand, mixed-precision with a high latency constraint achieves the closest accuracy to fixed-precision INT8 and reports a high (worse) EDP compared to fixed-precision INT4. On the other hand, with a low latency constraint, BF-IMNA reports the closest EDP to fixed-precision INT4, with a higher degradation in accuracy compared to fixed-precision INT8. We also show that BF-IMNA with fixed-precision configuration still delivers performance that is comparable to current state-of-the-art accelerators: BF-IMNA achieves $20\%$ higher energy efficiency and $2\%$ higher throughput.

Via

Access Paper or Ask Questions

AudioFool: Fast, Universal and synchronization-free Cross-Domain Attack on Speech Recognition

Sep 20, 2023

Mohamad Fakih, Rouwaida Kanj, Fadi Kurdahi, Mohammed E. Fouda

Abstract:Automatic Speech Recognition systems have been shown to be vulnerable to adversarial attacks that manipulate the command executed on the device. Recent research has focused on exploring methods to create such attacks, however, some issues relating to Over-The-Air (OTA) attacks have not been properly addressed. In our work, we examine the needed properties of robust attacks compatible with the OTA model, and we design a method of generating attacks with arbitrary such desired properties, namely the invariance to synchronization, and the robustness to filtering: this allows a Denial-of-Service (DoS) attack against ASR systems. We achieve these characteristics by constructing attacks in a modified frequency domain through an inverse Fourier transform. We evaluate our method on standard keyword classification tasks and analyze it in OTA, and we analyze the properties of the cross-domain attacks to explain the efficiency of the approach.

* 10 pages, 11 Figures

Via

Access Paper or Ask Questions

Mixed-Precision Neural Networks: A Survey

Aug 11, 2022

Mariam Rakka, Mohammed E. Fouda, Pramod Khargonekar, Fadi Kurdahi

Figure 1 for Mixed-Precision Neural Networks: A Survey

Figure 2 for Mixed-Precision Neural Networks: A Survey

Figure 3 for Mixed-Precision Neural Networks: A Survey

Figure 4 for Mixed-Precision Neural Networks: A Survey

Abstract:Mixed-precision Deep Neural Networks achieve the energy efficiency and throughput needed for hardware deployment, particularly when the resources are limited, without sacrificing accuracy. However, the optimal per-layer bit precision that preserves accuracy is not easily found, especially with the abundance of models, datasets, and quantization techniques that creates an enormous search space. In order to tackle this difficulty, a body of literature has emerged recently, and several frameworks that achieved promising accuracy results have been proposed. In this paper, we start by summarizing the quantization techniques used generally in literature. Then, we present a thorough survey of the mixed-precision frameworks, categorized according to their optimization techniques such as reinforcement learning and quantization techniques like deterministic rounding. Furthermore, the advantages and shortcomings of each framework are discussed, where we present a juxtaposition. We finally give guidelines for future mixed-precision frameworks.

Via

Access Paper or Ask Questions

DT2CAM: A Decision Tree to Content Addressable Memory Framework

Apr 12, 2022

Mariam Rakka, Mohammed E. Fouda, Rouwaida Kanj, Fadi Kurdahi

Figure 1 for DT2CAM: A Decision Tree to Content Addressable Memory Framework

Figure 2 for DT2CAM: A Decision Tree to Content Addressable Memory Framework

Figure 3 for DT2CAM: A Decision Tree to Content Addressable Memory Framework

Figure 4 for DT2CAM: A Decision Tree to Content Addressable Memory Framework

Abstract:Decision trees are considered one of the most powerful tools for data classification. Accelerating the decision tree search is crucial for on-the-edge applications that have limited power and latency budget. In this paper, we propose a Content Addressable Memory (CAM) Compiler for Decision Tree (DT) inference acceleration. We propose a novel "adaptive-precision" scheme that results in a compact implementation and enables an efficient bijective mapping to Ternary Content Addressable Memories while maintaining high inference accuracies. In addition, a Resistive-CAM (ReCAM) functional synthesizer is developed for mapping the decision tree to the ReCAM and performing functional simulations for energy, latency, and accuracy evaluations. We study the decision tree accuracy under hardware non-idealities including device defects, manufacturing variability, and input encoding noise. We test our framework on various DT datasets including \textit{Give Me Some Credit}, \textit{Titanic}, and \textit{COVID-19}. Our results reveal up to {42.4\%} energy savings and up to 17.8x better energy-delay-area product compared to the state-of-art hardware accelerators, and up to 333 million decisions per sec for the pipelined implementation.

Via

Access Paper or Ask Questions

Efficient Noise Mitigation Technique for Quantum Computing

Sep 10, 2021

Ali Shaib, Mohamad H. Naim, Mohammed E. Fouda, Rouwaida Kanj, Fadi Kurdahi

Figure 1 for Efficient Noise Mitigation Technique for Quantum Computing

Figure 2 for Efficient Noise Mitigation Technique for Quantum Computing

Figure 3 for Efficient Noise Mitigation Technique for Quantum Computing

Figure 4 for Efficient Noise Mitigation Technique for Quantum Computing

Abstract:Quantum computers have enabled solving problems beyond the current computers' capabilities. However, this requires handling noise arising from unwanted interactions in these systems. Several protocols have been proposed to address efficient and accurate quantum noise profiling and mitigation. In this work, we propose a novel protocol that efficiently estimates the average output of a noisy quantum device to be used for quantum noise mitigation. The multi-qubit system average behavior is approximated as a special form of a Pauli Channel where Clifford gates are used to estimate the average output for circuits of different depths. The characterized Pauli channel error rates, and state preparation and measurement errors are then used to construct the outputs for different depths thereby eliminating the need for large simulations and enabling efficient mitigation. We demonstrate the efficiency of the proposed protocol on four IBM Q 5-qubit quantum devices. Our method demonstrates improved accuracy with efficient noise characterization. We report up to 88\% and 69\% improvement for the proposed approach compared to the unmitigated, and pure measurement error mitigation approaches, respectively.

Via

Access Paper or Ask Questions

Resistive Neural Hardware Accelerators

Sep 08, 2021

Kamilya Smagulova, Mohammed E. Fouda, Fadi Kurdahi, Khaled Salama, Ahmed Eltawil

Figure 1 for Resistive Neural Hardware Accelerators

Figure 2 for Resistive Neural Hardware Accelerators

Figure 3 for Resistive Neural Hardware Accelerators

Figure 4 for Resistive Neural Hardware Accelerators

Abstract:Deep Neural Networks (DNNs), as a subset of Machine Learning (ML) techniques, entail that real-world data can be learned and that decisions can be made in real-time. However, their wide adoption is hindered by a number of software and hardware limitations. The existing general-purpose hardware platforms used to accelerate DNNs are facing new challenges associated with the growing amount of data and are exponentially increasing the complexity of computations. An emerging non-volatile memory (NVM) devices and processing-in-memory (PIM) paradigm is creating a new hardware architecture generation with increased computing and storage capabilities. In particular, the shift towards ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference and in training large-scale neural network architectures. These can accelerate the process of the IoT-enabled AI technologies entering our daily life. In this survey, we review the state-of-the-art ReRAM-based DNN many-core accelerators, and their superiority compared to CMOS counterparts was shown. The review covers different aspects of hardware and software realization of DNN accelerators, their present limitations, and future prospectives. In particular, comparison of the accelerators shows the need for the introduction of new performance metrics and benchmarking standards. In addition, the major concerns regarding the efficient design of accelerators include a lack of accuracy in simulation tools for software and hardware co-design.

Via

Access Paper or Ask Questions

An Accurate Non-accelerometer-based PPG Motion Artifact Removal Technique using CycleGAN

Jun 22, 2021

Amir Hosein Afandizadeh Zargari, Seyed Amir Hossein Aqajari, Hadi Khodabandeh, Amir M. Rahmani, Fadi Kurdahi

Figure 1 for An Accurate Non-accelerometer-based PPG Motion Artifact Removal Technique using CycleGAN

Figure 2 for An Accurate Non-accelerometer-based PPG Motion Artifact Removal Technique using CycleGAN

Figure 3 for An Accurate Non-accelerometer-based PPG Motion Artifact Removal Technique using CycleGAN

Figure 4 for An Accurate Non-accelerometer-based PPG Motion Artifact Removal Technique using CycleGAN

Abstract:A photoplethysmography (PPG) is an uncomplicated and inexpensive optical technique widely used in the healthcare domain to extract valuable health-related information, e.g., heart rate variability, blood pressure, and respiration rate. PPG signals can easily be collected continuously and remotely using portable wearable devices. However, these measuring devices are vulnerable to motion artifacts caused by daily life activities. The most common ways to eliminate motion artifacts use extra accelerometer sensors, which suffer from two limitations: i) high power consumption and ii) the need to integrate an accelerometer sensor in a wearable device (which is not required in certain wearables). This paper proposes a low-power non-accelerometer-based PPG motion artifacts removal method outperforming the accuracy of the existing methods. We use Cycle Generative Adversarial Network to reconstruct clean PPG signals from noisy PPG signals. Our novel machine-learning-based technique achieves 9.5 times improvement in motion artifact removal compared to the state-of-the-art without using extra sensors such as an accelerometer.

* Submitted to ACM Health

Via

Access Paper or Ask Questions

On-Chip Error-triggered Learning of Multi-layer Memristive Spiking Neural Networks

Nov 21, 2020

Melika Payvand, Mohammed E. Fouda, Fadi Kurdahi, Ahmed M. Eltawil, Emre O. Neftci

Figure 1 for On-Chip Error-triggered Learning of Multi-layer Memristive Spiking Neural Networks

Figure 2 for On-Chip Error-triggered Learning of Multi-layer Memristive Spiking Neural Networks

Figure 3 for On-Chip Error-triggered Learning of Multi-layer Memristive Spiking Neural Networks

Figure 4 for On-Chip Error-triggered Learning of Multi-layer Memristive Spiking Neural Networks

Abstract:Recent breakthroughs in neuromorphic computing show that local forms of gradient descent learning are compatible with Spiking Neural Networks (SNNs) and synaptic plasticity. Although SNNs can be scalably implemented using neuromorphic VLSI, an architecture that can learn using gradient-descent in situ is still missing. In this paper, we propose a local, gradient-based, error-triggered learning algorithm with online ternary weight updates. The proposed algorithm enables online training of multi-layer SNNs with memristive neuromorphic hardware showing a small loss in the performance compared with the state of the art. We also propose a hardware architecture based on memristive crossbar arrays to perform the required vector-matrix multiplications. The necessary peripheral circuitry including pre-synaptic, post-synaptic and write circuits required for online training, have been designed in the sub-threshold regime for power saving with a standard 180 nm CMOS process.

* 15 pages, 11 figures, Journal of Emerging Technology in Circuits and Systems (JETCAS)

Via

Access Paper or Ask Questions