Abstract:Early diagnosis of lung cancer is a key intervention for the treatment of lung cancer computer aided diagnosis (CAD) can play a crucial role. However, most published CAD methods treat lung cancer diagnosis as a lung nodule classification problem, which does not reflect clinical practice, where clinicians diagnose a patient based on a set of images of nodules, instead of one specific nodule. Besides, the low interpretability of the output provided by these methods presents an important barrier for their adoption. In this article, we treat lung cancer diagnosis as a multiple instance learning (MIL) problem in order to better reflect the diagnosis process in the clinical setting and for the higher interpretability of the output. We chose radiomics as the source of input features and deep attention-based MIL as the classification algorithm.The attention mechanism provides higher interpretability by estimating the importance of each instance in the set for the final diagnosis.In order to improve the model's performance in a small imbalanced dataset, we introduce a new bag simulation method for MIL.The results show that our method can achieve a mean accuracy of 0.807 with a standard error of the mean (SEM) of 0.069, a recall of 0.870 (SEM 0.061), a positive predictive value of 0.928 (SEM 0.078), a negative predictive value of 0.591 (SEM 0.155) and an area under the curve (AUC) of 0.842 (SEM 0.074), outperforming other MIL methods.Additional experiments show that the proposed oversampling strategy significantly improves the model's performance. In addition, our experiments show that our method provides an indication of the importance of each nodule in determining the diagnosis, which combined with the well-defined radiomic features, make the results more interpretable and acceptable for doctors and patients.