Abstract:A major challenge in applying deep learning to medical imaging is the paucity of annotated data. This study demonstrates that synthetic colonoscopy images generated by Generative Adversarial Network (GAN) inversion can be used as training data to improve the lesion classification performance of deep learning models. This approach inverts pairs of images with the same label to a semantically rich & disentangled latent space and manipulates latent representations to produce new synthetic images with the same label. We perform image modality translation (style transfer) between white light and narrowband imaging (NBI). We also generate realistic-looking synthetic lesion images by interpolating between original training images to increase the variety of lesion shapes in the training dataset. We show that these approaches outperform comparative colonoscopy data augmentation techniques without the need to re-train multiple generative models. This approach also leverages information from datasets that may not have been designed for the specific colonoscopy downstream task. E.g. using a bowel prep grading dataset for a polyp classification task. Our experiments show this approach can perform multiple colonoscopy data augmentations, which improve the downstream polyp classification performance over baseline and comparison methods by up to 6%.
Abstract:While data-driven approaches excel at many image analysis tasks, the performance of these approaches is often limited by a shortage of annotated data available for training. Recent work in semi-supervised learning has shown that meaningful representations of images can be obtained from training with large quantities of unlabeled data, and that these representations can improve the performance of supervised tasks. Here, we demonstrate that an unsupervised jigsaw learning task, in combination with supervised training, results in up to a 9.8% improvement in correctly classifying lesions in colonoscopy images when compared to a fully-supervised baseline. We additionally benchmark improvements in domain adaptation and out-of-distribution detection, and demonstrate that semi-supervised learning outperforms supervised learning in both cases. In colonoscopy applications, these metrics are important given the skill required for endoscopic assessment of lesions, the wide variety of endoscopy systems in use, and the homogeneity that is typical of labeled datasets.