Abstract:The advent of artificial intelligence (AI) has enabled a comprehensive exploration of materials for various applications. However, AI models often prioritize frequently encountered materials in the scientific literature, limiting the selection of suitable candidates based on inherent physical and chemical properties. To address this imbalance, we have generated a dataset of 1,494,017 natural language-material paragraphs based on combined OQMD, Materials Project, JARVIS, COD and AFLOW2 databases, which are dominated by ab initio calculations and tend to be much more evenly distributed on the periodic table. The generated text narratives were then polled and scored by both human experts and ChatGPT-4, based on three rubrics: technical accuracy, language and structure, and relevance and depth of content, showing similar scores but with human-scored depth of content being the most lagging. The merger of multi-modality data sources and large language model (LLM) holds immense potential for AI frameworks to help the exploration and discovery of solid-state materials for specific applications.
Abstract:We investigate whether large language models can perform the creative hypothesis generation that human researchers regularly do. While the error rate is high, generative AI seems to be able to effectively structure vast amounts of scientific knowledge and provide interesting and testable hypotheses. The future scientific enterprise may include synergistic efforts with a swarm of "hypothesis machines", challenged by automated experimentation and adversarial peer reviews.
Abstract:The dipole moment is a physical quantity indicating the polarity of a molecule and is determined by reflecting the electrical properties of constituent atoms and the geometric properties of the molecule. Most embeddings used to represent graph representations in traditional graph neural network methodologies treat molecules as topological graphs, creating a significant barrier to the goal of recognizing geometric information. Unlike existing embeddings dealing with equivariance, which have been proposed to handle the 3D structure of molecules properly, our proposed embeddings directly express the physical implications of the local contribution of dipole moments. We show that the developed model works reasonably even for molecules with extended geometries and captures more interatomic interaction information, significantly improving the prediction results with accuracy comparable to ab-initio calculations.