Abstract:Vision Transformer (ViT) is becoming widely popular in automating accurate disease diagnosis in medical imaging owing to its robust self-attention mechanism. However, ViTs remain vulnerable to adversarial attacks that may thwart the diagnosis process by leading it to intentional misclassification of critical disease. In this paper, we propose a novel image classification pipeline, namely, S-E Pipeline, that performs multiple pre-processing steps that allow ViT to be trained on critical features so as to reduce the impact of input perturbations by adversaries. Our method uses a combination of segmentation and image enhancement techniques such as Contrast Limited Adaptive Histogram Equalization (CLAHE), Unsharp Masking (UM), and High-Frequency Emphasis filtering (HFE) as preprocessing steps to identify critical features that remain intact even after adversarial perturbations. The experimental study demonstrates that our novel pipeline helps in reducing the effect of adversarial attacks by 72.22% for the ViT-b32 model and 86.58% for the ViT-l32 model. Furthermore, we have shown an end-to-end deployment of our proposed method on the NVIDIA Jetson Orin Nano board to demonstrate its practical use case in modern hand-held devices that are usually resource-constrained.
Abstract:Accelerating compute intensive non-real-time beam-forming algorithms in ultrasound imaging using deep learning architectures has been gaining momentum in the recent past. Nonetheless, the complexity of the state-of-the-art deep learning techniques poses challenges for deployment on resource-constrained edge devices. In this work, we propose a novel vision transformer based tiny beamformer (Tiny-VBF), which works on the raw radio-frequency channel data acquired through single-angle plane wave insonification. The output of our Tiny-VBF provides fast envelope detection requiring very low frame rate, i.e. 0.34 GOPs/Frame for a frame size of 368 x 128 in comparison to the state-of-the-art deep learning models. It also exhibited an 8% increase in contrast and gains of 5% and 33% in axial and lateral resolution respectively when compared to Tiny-CNN on in-vitro dataset. Additionally, our model showed a 4.2% increase in contrast and gains of 4% and 20% in axial and lateral resolution respectively when compared against conventional Delay-and-Sum (DAS) beamformer. We further propose an accelerator architecture and implement our Tiny-VBF model on a Zynq UltraScale+ MPSoC ZCU104 FPGA using a hybrid quantization scheme with 50% less resource consumption compared to the floating-point implementation, while preserving the image quality.