Abstract:When developing Computer Aided Detection (CAD) systems for Digital Breast Tomosynthesis (DBT), the complexity arising from the volumetric nature of the modality poses significant technical challenges for obtaining large-scale accurate annotations. Without access to large-scale annotations, the resulting model may not generalize to different domains. Given the costly nature of obtaining DBT annotations, how to effectively increase the amount of data used for training DBT CAD systems remains an open challenge. In this paper, we present SelectiveKD, a semi-supervised learning framework for building cancer detection models for DBT, which only requires a limited number of annotated slices to reach high performance. We achieve this by utilizing unlabeled slices available in a DBT stack through a knowledge distillation framework in which the teacher model provides a supervisory signal to the student model for all slices in the DBT volume. Our framework mitigates the potential noise in the supervisory signal from a sub-optimal teacher by implementing a selective dataset expansion strategy using pseudo labels. We evaluate our approach with a large-scale real-world dataset of over 10,000 DBT exams collected from multiple device manufacturers and locations. The resulting SelectiveKD process effectively utilizes unannotated slices from a DBT stack, leading to significantly improved cancer classification performance (AUC) and generalization performance.
Abstract:Deep Neural Networks (DNNs) have recently been achieving state-of-the-art performance on a variety of computer vision related tasks. However, their computational cost limits their ability to be implemented in embedded systems with restricted resources or strict latency constraints. Model compression has therefore been an active field of research to overcome this issue. On the other hand, DNNs typically require massive amounts of labeled data to be trained. This represents a second limitation to their deployment. Domain Adaptation (DA) addresses this issue by allowing to transfer knowledge learned on one labeled source distribution to a target distribution, possibly unlabeled. In this paper, we investigate on possible improvements of compression methods in DA setting. We focus on a compression method that was previously developed in the context of a single data distribution and show that, with a careful choice of data to use during compression and additional regularization terms directly related to DA objectives, it is possible to improve compression results. We also show that our method outperforms an existing compression method studied in the DA setting by a large margin for high compression rates. Although our work is based on one specific compression method, we also outline some general guidelines for improving compression in DA setting.