Abstract:Computationally generating novel synthetically accessible compounds with high affinity and low toxicity is a great challenge in drug design. Machine-learning models beyond conventional pharmacophoric methods have shown promise in generating novel small molecule compounds, but require significant tuning for a specific protein target. Here, we introduce a method called selective iterative latent variable refinement (SILVR) for conditioning an existing diffusion-based equivariant generative model without retraining. The model allows the generation of new molecules that fit into a binding site of a protein based on fragment hits. We use the SARS-CoV-2 Main protease fragments from Diamond X-Chem that form part of the COVID Moonshot project as a reference dataset for conditioning the molecule generation. The SILVR rate controls the extent of conditioning and we show that moderate SILVR rates make it possible to generate new molecules of similar shape to the original fragments, meaning that the new molecules fit the binding site without knowledge of the protein. We can also merge up to 3 fragments into a new molecule without affecting the quality of molecules generated by the underlying generative model. Our method is generalizable to any protein target with known fragments and any diffusion-based model for molecule generation.