Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jonathan S. C Lim

Gotta be SAFE: A New Framework for Molecular Design

Oct 16, 2023

Emmanuel Noutahi, Cristian Gabellini, Michael Craig, Jonathan S. C Lim, Prudencio Tossou

Figure 1 for Gotta be SAFE: A New Framework for Molecular Design

Figure 2 for Gotta be SAFE: A New Framework for Molecular Design

Figure 3 for Gotta be SAFE: A New Framework for Molecular Design

Figure 4 for Gotta be SAFE: A New Framework for Molecular Design

Abstract:Traditional molecular string representations, such as SMILES, often pose challenges for AI-driven molecular design due to their non-sequential depiction of molecular substructures. To address this issue, we introduce Sequential Attachment-based Fragment Embedding (SAFE), a novel line notation for chemical structures. SAFE reimagines SMILES strings as an unordered sequence of interconnected fragment blocks while maintaining full compatibility with existing SMILES parsers. It streamlines complex generative tasks, including scaffold decoration, fragment linking, polymer generation, and scaffold hopping, while facilitating autoregressive generation for fragment-constrained design, thereby eliminating the need for intricate decoding or graph-based models. We demonstrate the effectiveness of SAFE by training an 87-million-parameter GPT2-like model on a dataset containing 1.1 billion SAFE representations. Through extensive experimentation, we show that our SAFE-GPT model exhibits versatile and robust optimization performance. SAFE opens up new avenues for the rapid exploration of chemical space under various constraints, promising breakthroughs in AI-driven molecular design.

* Submitted to a workshop at Neurips 2023

Via

Access Paper or Ask Questions