Abstract:Creating large annotated datasets represents a major bottleneck for the development of deep learning models in radiology. To overcome this, we propose a combined use of weak labels (imprecise, but fast-to-create annotations) and Transfer Learning (TL). Specifically, we explore inductive TL, where source and target domains are identical, but tasks are different due to a label shift: our target labels are created manually by three radiologists, whereas the source weak labels are generated automatically from textual radiology reports. We frame knowledge transfer as hyperparameter optimization, thus avoiding heuristic choices that are frequent in related works. We investigate the relationship between model size and TL, comparing a low-capacity VGG with a higher-capacity SEResNeXt. The task that we address is change detection in follow-up glioma imaging: we extracted 1693 T2-weighted magnetic resonance imaging difference maps from 183 patients, and classified them into stable or unstable according to tumor evolution. Weak labeling allowed us to increase dataset size more than 3-fold, and improve VGG classification results from 75% to 82% Area Under the ROC Curve (AUC) (p=0.04). Mixed training from scratch led to higher performance than fine-tuning or feature extraction. To assess generalizability, we also ran inference on an open dataset (BraTS-2015: 15 patients, 51 difference maps), reaching up to 76% AUC. Overall, results suggest that medical imaging problems may benefit from smaller models and different TL strategies with respect to computer vision problems, and that report-generated weak labels are effective in improving model performances. Code, in-house dataset and BraTS labels are released.