Hospital Clínic University of Barcelona IDIBAPS CIBERehd, Spain
Abstract:We present a system for the prediction of microsatellite instability (MSI) from H&E images of colorectal cancer using deep learning (DL) techniques customized for tissue microarrays (TMAs). The system incorporates an end-to-end image preprocessing module that produces tiles at multiple magnifications in the regions of interest as guided by a tissue classifier module, and a multiple-bias rejecting module. The training and validation TMA samples were obtained from the EPICOLON project and further enriched with samples from a single institution. A systematic study of biases at tile level identified three protected (bias) variables associated with the learned representations of a baseline model: the project of origin of samples, the patient spot and the TMA glass where each spot was placed. A multiple bias rejecting technique based on adversarial training is implemented at the DL architecture so to directly avoid learning the batch effects of those variables. The learned features from the bias-ablated model have maximum discriminative power with respect to the task and minimal statistical mean dependence with the biases. The impact of different magnifications, types of tissues and the model performance at tile vs patient level is analyzed. The AUC at tile level, and including all three selected tissues (tumor epithelium, mucine and lymphocytic regions) and 4 magnifications, was 0.87 +/- 0.03 and increased to 0.9 +/- 0.03 at patient level. To the best of our knowledge, this is the first work that incorporates a multiple bias ablation technique at the DL architecture in digital pathology, and the first using TMAs for the MSI prediction task.