Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Allen Goodman

Concept Bottleneck Language Models For protein design

Nov 09, 2024

Aya Abdelsalam Ismail, Tuomas Oikarinen, Amy Wang, Julius Adebayo, Samuel Stanton, Taylor Joren, Joseph Kleinhenz, Allen Goodman, Héctor Corrada Bravo, Kyunghyun Cho(+1 more)

Figure 1 for Concept Bottleneck Language Models For protein design

Figure 2 for Concept Bottleneck Language Models For protein design

Figure 3 for Concept Bottleneck Language Models For protein design

Figure 4 for Concept Bottleneck Language Models For protein design

Abstract:We introduce Concept Bottleneck Protein Language Models (CB-pLM), a generative masked language model with a layer where each neuron corresponds to an interpretable concept. Our architecture offers three key benefits: i) Control: We can intervene on concept values to precisely control the properties of generated proteins, achieving a 3 times larger change in desired concept values compared to baselines. ii) Interpretability: A linear mapping between concept values and predicted tokens allows transparent analysis of the model's decision-making process. iii) Debugging: This transparency facilitates easy debugging of trained models. Our models achieve pre-training perplexity and downstream task performance comparable to traditional masked protein language models, demonstrating that interpretability does not compromise performance. While adaptable to any language model, we focus on masked protein language models due to their importance in drug discovery and the ability to validate our model's capabilities through real-world experiments and expert knowledge. We scale our CB-pLM from 24 million to 3 billion parameters, making them the largest Concept Bottleneck Models trained and the first capable of generative language modeling.

Via

Access Paper or Ask Questions

Applying Faster R-CNN for Object Detection on Malaria Images

Apr 25, 2018

Jane Hung, Allen Goodman, Stefanie Lopes, Gabriel Rangel, Deepali Ravel, Fabio Costa, Manoj Duraisingh, Matthias Marti, Anne Carpenter

Figure 1 for Applying Faster R-CNN for Object Detection on Malaria Images

Figure 2 for Applying Faster R-CNN for Object Detection on Malaria Images

Figure 3 for Applying Faster R-CNN for Object Detection on Malaria Images

Figure 4 for Applying Faster R-CNN for Object Detection on Malaria Images

Abstract:Deep learning based models have had great success in object detection, but the state of the art models have not yet been widely applied to biological image data. We apply for the first time an object detection model previously used on natural images to identify cells and recognize their stages in brightfield microscopy images of malaria-infected blood. Many micro-organisms like malaria parasites are still studied by expert manual inspection and hand counting. This type of object detection task is challenging due to factors like variations in cell shape, density, and color, and uncertainty of some cell classes. In addition, annotated data useful for training is scarce, and the class distribution is inherently highly imbalanced due to the dominance of uninfected red blood cells. We use Faster Region-based Convolutional Neural Network (Faster R-CNN), one of the top performing object detection models in recent years, pre-trained on ImageNet but fine tuned with our data, and compare it to a baseline, which is based on a traditional approach consisting of cell segmentation, extraction of several single-cell features, and classification using random forests. To conduct our initial study, we collect and label a dataset of 1300 fields of view consisting of around 100,000 individual cells. We demonstrate that Faster R-CNN outperforms our baseline and put the results in context of human performance.

* CVPR 2017: computer vision for microscopy image analysis (CVMI) Workshop

Via

Access Paper or Ask Questions