Abstract:Lung cancer has been one of the major threats to human life for decades. Computer-aided diagnosis can help with early lung nodul detection and facilitate subsequent nodule characterization. Large Visual Language models (VLMs) have been found effective for multiple downstream medical tasks that rely on both imaging and text data. However, lesion level detection and subsequent diagnosis using VLMs have not been explored yet. We propose CADe, for segmenting lung nodules in a zero-shot manner using a variant of the Segment Anything Model called MedSAM. CADe trains on a prompt suite on input computed tomography (CT) scans by using the CLIP text encoder through prefix tuning. We also propose, CADx, a method for the nodule characterization as benign/malignant by making a gallery of radiomic features and aligning image-feature pairs through contrastive learning. Training and validation of CADe and CADx have been done using one of the largest publicly available datasets, called LIDC. To check the generalization ability of the model, it is also evaluated on a challenging dataset, LUNGx. Our experimental results show that the proposed methods achieve a sensitivity of 0.86 compared to 0.76 that of other fully supervised methods.The source code, datasets and pre-processed data can be accessed using the link: