Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Hitchhiker's Guide to Deep Chemical Language Processing for Bioactivity Prediction

Jul 16, 2024

Rıza Özçelik, Francesca Grisoni

Figure 1 for A Hitchhiker's Guide to Deep Chemical Language Processing for Bioactivity Prediction

Figure 2 for A Hitchhiker's Guide to Deep Chemical Language Processing for Bioactivity Prediction

Figure 3 for A Hitchhiker's Guide to Deep Chemical Language Processing for Bioactivity Prediction

Figure 4 for A Hitchhiker's Guide to Deep Chemical Language Processing for Bioactivity Prediction

Share this with someone who'll enjoy it:

Abstract:Deep learning has significantly accelerated drug discovery, with 'chemical language' processing (CLP) emerging as a prominent approach. CLP learns from molecular string representations (e.g., Simplified Molecular Input Line Entry Systems [SMILES] and Self-Referencing Embedded Strings [SELFIES]) with methods akin to natural language processing. Despite their growing importance, training predictive CLP models is far from trivial, as it involves many 'bells and whistles'. Here, we analyze the key elements of CLP training, to provide guidelines for newcomers and experts alike. Our study spans three neural network architectures, two string representations, three embedding strategies, across ten bioactivity datasets, for both classification and regression purposes. This 'hitchhiker's guide' not only underscores the importance of certain methodological choices, but it also equips researchers with practical recommendations on ideal choices, e.g., in terms of neural network architectures, molecular representations, and hyperparameter optimization.

View paper on

Share this with someone who'll enjoy it:

Title:A Hitchhiker's Guide to Deep Chemical Language Processing for Bioactivity Prediction

Paper and Code