Abstract:Cervical spine fractures constitute a critical medical emergency, with the potential for lifelong paralysis or even fatality if left untreated or undetected. Over time, these fractures can deteriorate without intervention. To address the lack of research on the practical application of deep learning techniques for the detection of spine fractures, this study leverages a dataset containing both cervical spine fractures and non-fractured computed tomography images. This paper introduces a two-stage pipeline designed to identify the presence of cervical vertebrae in each image slice and pinpoint the location of fractures. In the first stage, a multi-input network, incorporating image and image metadata, is trained. This network is based on the Global Context Vision Transformer, and its performance is benchmarked against popular deep learning image classification model. In the second stage, a YOLOv8 model is trained to detect fractures within the images, and its effectiveness is compared to YOLOv5. The obtained results indicate that the proposed algorithm significantly reduces the workload of radiologists and enhances the accuracy of fracture detection.