Abstract:Medical image classification plays a crucial role in computer-aided clinical diagnosis. While deep learning techniques have significantly enhanced efficiency and reduced costs, the privacy-sensitive nature of medical imaging data complicates centralized storage and model training. Furthermore, low-resource healthcare organizations face challenges related to communication overhead and efficiency due to increasing data and model scales. This paper proposes a novel privacy-preserving medical image classification framework based on federated learning to address these issues, named FedMIC. The framework enables healthcare organizations to learn from both global and local knowledge, enhancing local representation of private data despite statistical heterogeneity. It provides customized models for organizations with diverse data distributions while minimizing communication overhead and improving efficiency without compromising performance. Our FedMIC enhances robustness and practical applicability under resource-constrained conditions. We demonstrate FedMIC's effectiveness using four public medical image datasets for classical medical image classification tasks.
Abstract:Retinal optical coherence tomography (OCT) images provide crucial insights into the health of the posterior ocular segment. Therefore, the advancement of automated image analysis methods is imperative to equip clinicians and researchers with quantitative data, thereby facilitating informed decision-making. The application of deep learning (DL)-based approaches has gained extensive traction for executing these analysis tasks, demonstrating remarkable performance compared to labor-intensive manual analyses. However, the acquisition of Retinal OCT images often presents challenges stemming from privacy concerns and the resource-intensive labeling procedures, which contradicts the prevailing notion that DL models necessitate substantial data volumes for achieving superior performance. Moreover, limitations in available computational resources constrain the progress of high-performance medical artificial intelligence, particularly in less developed regions and countries. This paper introduces a novel ensemble learning mechanism designed for recognizing retinal diseases under limited resources (e.g., data, computation). The mechanism leverages insights from multiple pre-trained models, facilitating the transfer and adaptation of their knowledge to Retinal OCT images. This approach establishes a robust model even when confronted with limited labeled data, eliminating the need for an extensive array of parameters, as required in learning from scratch. Comprehensive experimentation on real-world datasets demonstrates that the proposed approach can achieve superior performance in recognizing Retinal OCT images, even when dealing with exceedingly restricted labeled datasets. Furthermore, this method obviates the necessity of learning extensive-scale parameters, making it well-suited for deployment in low-resource scenarios.
Abstract:Remote Sensing Target Fine-grained Classification (TFGC) is of great significance in both military and civilian fields. Due to location differences, growth in data size, and centralized server storage constraints, these data are usually stored under different databases across regions/countries. However, privacy laws and national security concerns constrain researchers from accessing these sensitive remote sensing images for further analysis. Additionally, low-resource remote sensing devices encounter challenges in terms of communication overhead and efficiency when dealing with the ever-increasing data and model scales. To solve the above challenges, this paper proposes a novel Privacy-Reserving TFGC Framework based on Federated Learning, dubbed PRFL. The proposed framework allows each client to learn global and local knowledge to enhance the local representation of private data in environments with extreme statistical heterogeneity (non. Independent and Identically Distributed, IID). Thus, it provides highly customized models to clients with differentiated data distributions. Moreover, the framework minimizes communication overhead and improves efficiency while ensuring satisfactory performance, thereby enhancing robustness and practical applicability under resource-scarce conditions. We demonstrate the effectiveness of the proposed PRFL on the classical TFGC task by leveraging four public datasets.
Abstract:Chest imaging plays an essential role in diagnosing and predicting patients with COVID-19 with evidence of worsening respiratory status. Many deep learning-based diagnostic models for pneumonia have been developed to enable computer-aided diagnosis. However, the long training and inference time make them inflexible. In addition, the lack of interpretability reduces their credibility in clinical medical practice. This paper presents CMT, a model with interpretability and rapid recognition of pneumonia, especially COVID-19 positive. Multiple convolutional layers in CMT are first used to extract features in CXR images, and then Transformer is applied to calculate the possibility of each symptom. To improve the model's generalization performance and to address the problem of sparse medical image data, we propose Feature Fusion Augmentation (FFA), a plug-and-play method for image augmentation. It fuses the features of the two images to varying degrees to produce a new image that does not deviate from the original distribution. Furthermore, to reduce the computational complexity and accelerate the convergence, we propose Multilevel Multi-Head Self-Attention (MMSA), which computes attention on different levels to establish the relationship between global and local features. It significantly improves the model performance while substantially reducing its training and inference time. Experimental results on the largest COVID-19 dataset show the proposed CMT has state-of-the-art performance. The effectiveness of FFA and MMSA is demonstrated in the ablation experiments. In addition, the weights and feature activation maps of the model inference process are visualized to show the CMT's interpretability.