Abstract:Automated analysis of volumetric medical imaging on edge devices is severely constrained by the high memory and computational demands of 3D Convolutional Neural Networks (CNNs). This paper develops a lightweight computer vision framework that reconciles the efficiency of 2D detection with the necessity of 3D context by reformulating volumetric Computer Tomography (CT) data as sequential video streams. This video-viewpoint paradigm is applied to the time-sensitive task of Intracranial Hemorrhage (ICH) detection using the Hemorica dataset. To ensure operational efficiency, we benchmarked multiple generations of the YOLO architecture (v8, v10, v11 and v12) in their Nano configurations, selecting the version with the highest mAP@50 to serve as the slice-level backbone. A ByteTrack algorithm is then introduced to enforce anatomical consistency across the $z$-axis. To address the initialization lag inherent in video trackers, a hybrid inference strategy and a spatiotemporal consistency filter are proposed to distinguish true pathology from transient prediction noise. Experimental results on independent test data demonstrate that the proposed framework serves as a rigorous temporal validator, increasing detection Precision from 0.703 to 0.779 compared to the baseline 2D detector, while maintaining high sensitivity. By approximating 3D contextual reasoning at a fraction of the computational cost, this method provides a scalable solution for real-time patient prioritization in resource-constrained environments, such as mobile stroke units and IoT-enabled remote clinics.
Abstract:The proposed YOLO-Former method seamlessly integrates the ideas of transformer and YOLOv4 to create a highly accurate and efficient object detection system. The method leverages the fast inference speed of YOLOv4 and incorporates the advantages of the transformer architecture through the integration of convolutional attention and transformer modules. The results demonstrate the effectiveness of the proposed approach, with a mean average precision (mAP) of 85.76\% on the Pascal VOC dataset, while maintaining high prediction speed with a frame rate of 10.85 frames per second. The contribution of this work lies in the demonstration of how the innovative combination of these two state-of-the-art techniques can lead to further improvements in the field of object detection.




Abstract:During the COVID-19 pandemic, wearing a face mask has been known to be an effective way to prevent the spread of COVID-19. In lots of monitoring tasks, humans have been replaced with computers thanks to the outstanding performance of the deep learning models. Monitoring the wearing of a face mask is another task that can be done by deep learning models with acceptable accuracy. The main challenge of this task is the limited amount of data because of the quarantine. In this paper, we did an investigation on the capability of three state-of-the-art object detection neural networks on face mask detection for real-time applications. As mentioned, here are three models used, Single Shot Detector (SSD), two versions of You Only Look Once (YOLO) i.e., YOLOv4-tiny, and YOLOv4-tiny-3l from which the best was selected. In the proposed method, according to the performance of different models, the best model that can be suitable for use in real-world and mobile device applications in comparison to other recent studies was the YOLOv4-tiny model, with 85.31% and 50.66 for mean Average Precision (mAP) and Frames Per Second (FPS), respectively. These acceptable values were achieved using two datasets with only 1531 images in three separate classes.