Abstract:Most of the existing machine-learning schemes applied to atomic-scale simulations rely on a local description of the geometry of a structure, and struggle to model effects that are driven by long-range physical interactions. Efforts to overcome these limitations have focused on the direct incorporation of electrostatics, which is the most prominent effect, often relying on architectures that mirror the functional form of explicit physical models. Including other forms of non-bonded interactions, or predicting properties other than the interatomic potential, requires ad hoc modifications. We propose an alternative approach that extends the long-distance equivariant (LODE) framework to generate local descriptors of an atomic environment that resemble non-bonded potentials with arbitrary asymptotic behaviors, ranging from point-charge electrostatics to dispersion forces. We show that the LODE formalism is amenable to a direct physical interpretation in terms of a generalized multipole expansion, that simplifies its implementation and reduces the number of descriptors needed to capture a given asymptotic behavior. These generalized LODE features provide improved extrapolation capabilities when trained on structures dominated by a given asymptotic behavior, but do not help in capturing the wildly different energy scales that are relevant for a more heterogeneous data set. This approach provides a practical scheme to incorporate different types of non-bonded interactions, and a framework to investigate the interplay of physical and data-related considerations that underlie this challenging modeling problem.
Abstract:Achieving a complete and symmetric description of a group of point particles, such as atoms in a molecule, is a common problem in physics and theoretical chemistry. The introduction of machine learning to science has made this issue even more critical, as it underpins the ability of a model to reproduce arbitrary physical relationships, and to do so while being consistent with basic symmetries and conservation laws. However, the descriptors that are commonly used to represent point clouds -- most notably those adopted to describe matter at the atomic scale -- are unable to distinguish between special arrangements of particles. This makes it impossible to machine learn their properties. Frameworks that are provably complete exist, but are only so in the limit in which they simultaneously describe the mutual relationship between all atoms, which is impractical. We introduce, and demonstrate on a particularly insidious class of atomic arrangements, a strategy to build descriptors that rely solely on information on the relative arrangement of triplets of particles, but can be used to construct symmetry-adapted models that have universal approximation power.