Abstract:Neural network (NN) model chemistries (MCs) promise to facilitate the accurate exploration of chemical space and simulation of large reactive systems. One important path to improving these models is to add layers of physical detail, especially long-range forces. At short range, however, these models are data driven and data limited. Little is systematically known about how data should be sampled, and `test data' chosen randomly from some sampling techniques can provide poor information about generality. If the sampling method is narrow `test error' can appear encouragingly tiny while the model fails catastrophically elsewhere. In this manuscript we competitively evaluate two common sampling methods: molecular dynamics (MD), normal-mode sampling (NMS) and one uncommon alternative, Metadynamics (MetaMD), for preparing training geometries. We show that MD is an inefficient sampling method in the sense that additional samples do not improve generality. We also show MetaMD is easily implemented in any NNMC software package with cost that scales linearly with the number of atoms in a sample molecule. MetaMD is a black-box way to ensure samples always reach out to new regions of chemical space, while remaining relevant to chemistry near $k_bT$. It is one cheap tool to address the issue of generalization.
Abstract:In Inverse subsumption for complete explanatory induction Yamamoto et al. investigate which inductive logic programming systems can learn a correct hypothesis $H$ by using the inverse subsumption instead of inverse entailment. We prove that inductive logic programming system Imparo is complete by inverse subsumption for learning a correct definite hypothesis $H$ wrt the definite background theory $B$ and ground atomic examples $E$, by establishing that there exists a connected theory $T$ for $B$ and $E$ such that $H$ subsumes $T$.