Abstract:Lung adenocarcinoma is a morphologically heterogeneous disease, characterized by five primary histologic growth patterns. The quantity of these patterns can be related to tumor behavior and has a significant impact on patient prognosis. In this work, we propose a novel machine learning pipeline capable of classifying tissue tiles into one of the five patterns or as non-tumor, with an Area Under the Receiver Operating Characteristic Curve (AUCROC) score of 0.97. Our model's strength lies in its comprehensive consideration of cellular spatial patterns, where it first generates cell maps from Hematoxylin and Eosin (H&E) whole slide images (WSIs), which are then fed into a convolutional neural network classification model. Exploiting these cell maps provides the model with robust generalizability to new data, achieving approximately 30% higher accuracy on unseen test-sets compared to current state of the art approaches. The insights derived from our model can be used to predict prognosis, enhancing patient outcomes.
Abstract:Federated learning (FL) enables collaborative learning of a deep learning model without sharing the data of participating sites. FL in medical image analysis tasks is relatively new and open for enhancements. In this study, we propose FedDropoutAvg, a new federated learning approach for training a generalizable model. The proposed method takes advantage of randomness, both in client selection and also in federated averaging process. We compare FedDropoutAvg to several algorithms in an FL scenario for real-world multi-site histopathology image classification task. We show that with FedDropoutAvg, the final model can achieve performance better than other FL approaches and closer to a classical deep learning model that requires all data to be shared for centralized training. We test the trained models on a large dataset consisting of 1.2 million image tiles from 21 different centers. To evaluate the generalization ability of the proposed approach, we use held-out test sets from centers whose data was used in the FL and for unseen data from other independent centers whose data was not used in the federated training. We show that the proposed approach is more generalizable than other state-of-the-art federated training approaches. To the best of our knowledge, ours is the first study to use a randomized client and local model parameter selection procedure in a federated setting for a medical image analysis task.