Abstract:Deep neural networks (DNNs) are increasingly used in safety-critical applications. Reliable fault analysis and mitigation are essential to ensure their functionality in harsh environments that contain high radiation levels. This study analyses the impact of multiple single-bit single-event upsets in DNNs by performing fault injection at the level of a DNN model. Additionally, a fault aware training (FAT) methodology is proposed that improves the DNNs' robustness to faults without any modification to the hardware. Experimental results show that the FAT methodology improves the tolerance to faults up to a factor 3.
Abstract:Deep Neural Network (DNN) accelerators are extensively used to improve the computational efficiency of DNNs, but are prone to faults through Single-Event Upsets (SEUs). In this work, we present an in-depth analysis of the impact of SEUs on a Systolic Array (SA) based DNN accelerator. A fault injection campaign is performed through a Register-Transfer Level (RTL) based simulation environment to improve the observability of each hardware block, including the SA itself as well as the post-processing pipeline. From this analysis, we present the sensitivity, independent of a DNN model architecture, for various flip-flop groups both in terms of fault propagation probability and fault magnitude. This allows us to draw detailed conclusions and determine optimal mitigation strategies.