Abstract:Understanding the progression of cancer is crucial for defining treatments for patients. The objective of this study is to automate the detection of metastatic liver disease from free-style computed tomography (CT) radiology reports. Our research demonstrates that transferring knowledge using three approaches can improve model performance. First, we utilize generic language models (LMs), pretrained in a self-supervised manner. Second, we use a semi-supervised approach to train our model by automatically annotating a large unlabeled dataset; this approach substantially enhances the model's performance. Finally, we transfer knowledge from related tasks by designing a multi-task transfer learning methodology. We leverage the recent advancement of parameter-efficient LM adaptation strategies to improve performance and training efficiency. Our dataset consists of CT reports collected at Memorial Sloan Kettering Cancer Center (MSKCC) over the course of 12 years. 2,641 reports were manually annotated by domain experts; among them, 841 reports have been annotated for the presence of liver metastases. Our best model achieved an F1-score of 73.8%, a precision of 84%, and a recall of 65.8%.