Abstract:A force field as accurate as quantum mechanics (QM) and as fast as molecular mechanics (MM), with which one can simulate a biomolecular system efficiently enough and meaningfully enough to get quantitative insights, is among the most ardent dreams of biophysicists -- a dream, nevertheless, not to be fulfilled any time soon. Machine learning force fields (MLFFs) represent a meaningful endeavor towards this direction, where differentiable neural functions are parametrized to fit ab initio energies, and furthermore forces through automatic differentiation. We argue that, as of now, the utility of the MLFF models is no longer bottlenecked by accuracy but primarily by their speed (as well as stability and generalizability), as many recent variants, on limited chemical spaces, have long surpassed the chemical accuracy of $1$ kcal/mol -- the empirical threshold beyond which realistic chemical predictions are possible -- though still magnitudes slower than MM. Hoping to kindle explorations and designs of faster, albeit perhaps slightly less accurate MLFFs, in this review, we focus our attention on the design space (the speed-accuracy tradeoff) between MM and ML force fields. After a brief review of the building blocks of force fields of either kind, we discuss the desired properties and challenges now faced by the force field development community, survey the efforts to make MM force fields more accurate and ML force fields faster, envision what the next generation of MLFF might look like.
Abstract:Molecular mechanics (MM) force fields -- the models that characterize the energy landscape of molecular systems via simple pairwise and polynomial terms -- have traditionally relied on human expert-curated, inflexible, and poorly extensible discrete chemical parameter assignment rules, namely atom or valence types. Recently, there has been significant interest in using graph neural networks to replace this process, while enabling the parametrization scheme to be learned in an end-to-end differentiable manner directly from quantum chemical calculations or condensed-phase data. In this paper, we extend the Espaloma end-to-end differentiable force field construction approach by incorporating both energy and force fitting directly to quantum chemical data into the training process. Building on the OpenMM SPICE dataset, we curate a dataset containing chemical spaces highly relevant to the broad interest of biomolecular modeling, covering small molecules, proteins, and RNA. The resulting force field, espaloma 0.3.0, self-consistently parametrizes these diverse biomolecular species, accurately predicts quantum chemical energies and forces, and maintains stable quantum chemical energy-minimized geometries. Surprisingly, this simple approach produces highly accurate protein-ligand binding free energies when self-consistently parametrizing protein and ligand. This approach -- capable of fitting new force fields to large quantum chemical datasets in one GPU-day -- shows significant promise as a path forward for building systematically more accurate force fields that can be easily extended to new chemical domains of interest.
Abstract:Atomic partial charges are crucial parameters in molecular dynamics (MD) simulation, dictating the electrostatic contributions to intermolecular energies, and thereby the potential energy landscape. Traditionally, the assignment of partial charges has relied on surrogates of \textit{ab initio} semiempirical quantum chemical methods such as AM1-BCC, and is expensive for large systems or large numbers of molecules. We propose a hybrid physical / graph neural network-based approximation to the widely popular AM1-BCC charge model that is orders of magnitude faster while maintaining accuracy comparable to differences in AM1-BCC implementations. Our hybrid approach couples a graph neural network to a streamlined charge equilibration approach in order to predict molecule-specific atomic electronegativity and hardness parameters, followed by analytical determination of optimal charge-equilibrated parameters that preserves total molecular charge. This hybrid approach scales linearly with the number of atoms, enabling, for the first time, the use of fully consistent charge models for small molecules and biopolymers for the construction of next-generation self-consistent biomolecular force fields. Implemented in the free and open source package \texttt{espaloma\_charge}, this approach provides drop-in replacements for both AmberTools \texttt{antechamber} and the Open Force Field Toolkit charging workflows, in addition to stand-alone charge generation interfaces. Source code is available at \url{https://github.com/choderalab/espaloma_charge}.