Abstract:Generative models on discrete state-spaces have a wide range of potential applications, particularly in the domain of natural sciences. In continuous state-spaces, controllable and flexible generation of samples with desired properties has been realized using guidance on diffusion and flow models. However, these guidance approaches are not readily amenable to discrete state-space models. Consequently, we introduce a general and principled method for applying guidance on such models. Our method depends on leveraging continuous-time Markov processes on discrete state-spaces, which unlocks computational tractability for sampling from a desired guided distribution. We demonstrate the utility of our approach, Discrete Guidance, on a range of applications including guided generation of images, small-molecules, DNA sequences and protein sequences.
Abstract:The need for function estimation in label-limited settings is common in the natural sciences. At the same time, prior knowledge of function values is often available in these domains. For example, data-free biophysics-based models can be informative on protein properties, while quantum-based computations can be informative on small molecule properties. How can we coherently leverage such prior knowledge to help improve a neural network model that is quite accurate in some regions of input space -- typically near the training data -- but wildly wrong in other regions? Bayesian neural networks (BNN) enable the user to specify prior information only on the neural network weights, not directly on the function values. Moreover, there is in general no clear mapping between these. Herein, we tackle this problem by developing an approach to augment BNNs with prior information on the function values themselves. Our probabilistic approach yields predictions that rely more heavily on the prior information when the epistemic uncertainty is large, and more heavily on the neural network when the epistemic uncertainty is small.
Abstract:Drug discovery projects entail cycles of design, synthesis, and testing that yield a series of chemically related small molecules whose properties, such as binding affinity to a given target protein, are progressively tailored to a particular drug discovery goal. The use of deep learning technologies could augment the typical practice of using human intuition in the design cycle, and thereby expedite drug discovery projects. Here we present DESMILES, a deep neural network model that advances the state of the art in machine learning approaches to molecular design. We applied DESMILES to a previously published benchmark that assesses the ability of a method to modify input molecules to inhibit the dopamine receptor D2, and DESMILES yielded a 77% lower failure rate compared to state-of-the-art models. To explain the ability of DESMILES to hone molecular properties, we visualize a layer of the DESMILES network, and further demonstrate this ability by using DESMILES to tailor the same molecules used in the D2 benchmark test to dock more potently against seven different receptors.