Abstract:The rapid increase in the number of Computed Tomography (CT) scan examinations has created an urgent need for automated tools, such as organ segmentation, anomaly classification, and report generation, to assist radiologists with their growing workload. Multi-label classification of Three-Dimensional (3D) CT scans is a challenging task due to the volumetric nature of the data and the variety of anomalies to be detected. Existing deep learning methods based on Convolutional Neural Networks (CNNs) struggle to capture long-range dependencies effectively, while Vision Transformers require extensive pre-training, posing challenges for practical use. Additionally, these existing methods do not explicitly model the radiologist's navigational behavior while scrolling through CT scan slices, which requires both global context understanding and local detail awareness. In this study, we present CT-Scroll, a novel global-local attention model specifically designed to emulate the scrolling behavior of radiologists during the analysis of 3D CT scans. Our approach is evaluated on two public datasets, demonstrating its efficacy through comprehensive experiments and an ablation study that highlights the contribution of each model component.
Abstract:Early detection of cervical cancer is crucial for improving patient outcomes and reducing mortality by identifying precancerous lesions as soon as possible. As a result, the use of pap smear screening has significantly increased, leading to a growing demand for automated tools that can assist cytologists managing their rising workload. To address this, the Pap Smear Cell Classification Challenge (PS3C) has been organized in association with ISBI in 2025. This project aims to promote the development of automated tools for pap smear images classification. The analyzed images are grouped into four categories: healthy, unhealthy, both, and rubbish images which are considered as unsuitable for diagnosis. In this work, we propose a two-stage ensemble approach: first, a neural network determines whether an image is rubbish or not. If not, a second neural network classifies the image as containing a healthy cell, an unhealthy cell, or both.
Abstract:Early detection of cervical cancer is crucial for improving patient outcomes and reducing mortality by identifying precancerous lesions as soon as possible. As a result, the use of pap smear screening has significantly increased, leading to a growing demand for automated tools that can assist cytologists managing their rising workload. To address this, the Pep Smear Cell Classification Challenge (PS3C) has been organized in association with ISBI in 2025. This project aims to promote the development of automated tools for pep smear images classification. The analyzed images are grouped into four categories: healthy, unhealthy, both, and rubbish images which are considered as unsuitable for diagnosis. In this work, we propose a two-stage ensemble approach: first, a neural network determines whether an image is rubbish or not. If not, a second neural network classifies the image as containing a healthy cell, an unhealthy cell, or both.
Abstract:The rapid increase of computed tomography (CT) scans and their time-consuming manual analysis have created an urgent need for robust automated analysis techniques in clinical settings. These aim to assist radiologists and help them managing their growing workload. Existing methods typically generate entire reports directly from 3D CT images, without explicitly focusing on observed abnormalities. This unguided approach often results in repetitive content or incomplete reports, failing to prioritize anomaly-specific descriptions. We propose a new anomaly-guided report generation model, which first predicts abnormalities and then generates targeted descriptions for each. Evaluation on a public dataset demonstrates significant improvements in report quality and clinical relevance. We extend our work by conducting an ablation study to demonstrate its effectiveness.
Abstract:We propose a novel method for geolocalizing Unmanned Aerial Vehicles (UAVs) in environments lacking Global Navigation Satellite Systems (GNSS). Current state-of-the-art techniques employ an offline-trained encoder to generate a vector representation (embedding) of the UAV's current view, which is then compared with pre-computed embeddings of geo-referenced images to determine the UAV's position. Here, we demonstrate that the performance of these methods can be significantly enhanced by preprocessing the images to extract their edges, which exhibit robustness to seasonal and illumination variations. Furthermore, we establish that utilizing edges enhances resilience to orientation and altitude inaccuracies. Additionally, we introduce a confidence criterion for localization. Our findings are substantiated through synthetic experiments.