Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ingoo Lee

Evaluation of large language models for discovery of gene set function

Sep 07, 2023

Mengzhou Hu, Sahar Alkhairy, Ingoo Lee, Rudolf T. Pillich, Robin Bachelder, Trey Ideker, Dexter Pratt

Abstract:Gene set analysis is a mainstay of functional genomics, but it relies on manually curated databases of gene functions that are incomplete and unaware of biological context. Here we evaluate the ability of OpenAI's GPT-4, a Large Language Model (LLM), to develop hypotheses about common gene functions from its embedded biomedical knowledge. We created a GPT-4 pipeline to label gene sets with names that summarize their consensus functions, substantiated by analysis text and citations. Benchmarking against named gene sets in the Gene Ontology, GPT-4 generated very similar names in 50% of cases, while in most remaining cases it recovered the name of a more general concept. In gene sets discovered in 'omics data, GPT-4 names were more informative than gene set enrichment, with supporting statements and citations that largely verified in human review. The ability to rapidly synthesize common gene functions positions LLMs as valuable functional genomics assistants.

Via

Access Paper or Ask Questions

DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences

Nov 06, 2018

Ingoo Lee, Jongsoo Keum, Hojung Nam

Figure 1 for DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences

Figure 2 for DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences

Figure 3 for DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences

Figure 4 for DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences

Abstract:Identification of drug-target interactions (DTIs) plays a key role in drug discovery. The high cost and labor-intensive nature of in vitro and in vivo experiments have highlighted the importance of in silico-based DTI prediction approaches. In several computational models, conventional protein descriptors are shown to be not informative enough to predict accurate DTIs. Thus, in this study, we employ a convolutional neural network (CNN) on raw protein sequences to capture local residue patterns participating in DTIs. With CNN on protein sequences, our model performs better than previous protein descriptor-based models. In addition, our model performs better than the previous deep learning model for massive prediction of DTIs. By examining the pooled convolution results, we found that our model can detect binding sites of proteins for DTIs. In conclusion, our prediction model for detecting local residue patterns of target proteins successfully enriches the protein features of a raw protein sequence, yielding better prediction results than previous approaches.

* 26 pages, 7 figures

Via

Access Paper or Ask Questions