Abstract:Protein Language Models (PLMs) have emerged as performant and scalable tools for predicting the functional impact and clinical significance of protein-coding variants, but they still lag experimental accuracy. Here, we present a novel fine-tuning approach to improve the performance of PLMs with experimental maps of variant effects from Deep Mutational Scanning (DMS) assays using a Normalised Log-odds Ratio (NLR) head. We find consistent improvements in a held-out protein test set, and on independent DMS and clinical variant annotation benchmarks from ProteinGym and ClinVar. These findings demonstrate that DMS is a promising source of sequence diversity and supervised training data for improving the performance of PLMs for variant effect prediction.
Abstract:We present an application of the well-known Mask R-CNN approach to the counting of different types of bacterial colony forming units that were cultured in Petri dishes. Our model was made available to lab technicians in a modern SPA (Single-Page Application). Users can upload images of dishes, after which the Mask R-CNN model that was trained and tuned specifically for this task detects the number of BVG- and BVG+ colonies and displays these in an interactive interface for the user to verify. Users can then check the model's predictions, correct them if deemed necessary, and finally validate them. Our adapted Mask R-CNN model achieves a mean average precision (mAP) of 94\% at an intersection-over-union (IoU) threshold of 50\%. With these encouraging results, we see opportunities to bring the benefits of improved accuracy and time saved to related problems, such as generalising to other bacteria types and viral foci counting.
Abstract:During the development of vaccines, bacterial colony forming units (CFUs) are counted in order to quantify the yield in the fermentation process. This manual task is time-consuming and error-prone. In this work we test multiple segmentation algorithms based on the U-Net CNN architecture and show that these offer robust, automated CFU counting. We show that the multiclass generalisation with a bespoke loss function allows distinguishing virulent and avirulent colonies with acceptable accuracy. While many possibilities are left to explore, our results show the potential of deep learning for separating and classifying bacterial colonies.