Abstract:In solid-state materials science, substantial efforts have been devoted to the calculation and modeling of the electronic band gap. While a wide range of ab initio methods and machine learning algorithms have been created that can predict this quantity, the development of new computational approaches for studying the band gap remains an active area of research. Here we introduce a simple machine learning model for predicting the band gap using only the chemical composition of the crystalline material. To motivate the form of the model, we first analyze the empirical distribution of the band gap, which sheds new light on its atypical statistics. Specifically, our analysis enables us to frame band gap prediction as a task of modeling a mixed random variable, and we design our model accordingly. Our model formulation incorporates thematic ideas from chemical heuristic models for other material properties in a manner that is suited towards the band gap modeling task. The model has exactly one parameter corresponding to each element, which is fit using data. To predict the band gap for a given material, the model computes a weighted average of the parameters associated with its constituent elements and then takes the maximum of this quantity and zero. The model provides heuristic chemical interpretability by intuitively capturing the associations between the band gap and individual chemical elements.
Abstract:Artificial intelligence (AI) has revolutionized the field of materials science by improving the prediction of properties and accelerating the discovery of novel materials. In recent years, publicly available material data repositories containing data for various material properties have grown rapidly. In this work, we introduce Multimodal Learning for Crystalline Materials (MLCM), a new method for training a foundation model for crystalline materials via multimodal alignment, where high-dimensional material properties (i.e. modalities) are connected in a shared latent space to produce highly useful material representations. We show the utility of MLCM on multiple axes: (i) MLCM achieves state-of-the-art performance for material property prediction on the challenging Materials Project database; (ii) MLCM enables a novel, highly accurate method for inverse design, allowing one to screen for stable material with desired properties; and (iii) MLCM allows the extraction of interpretable emergent features that may provide insight to material scientists. Further, we explore several novel methods for aligning an arbitrary number of modalities, improving upon prior art in multimodal learning that focuses on bimodal alignment. Our work brings innovations from the ongoing AI revolution into the domain of materials science and identifies materials as a testbed for the next generation of AI.
Abstract:Topological materials present unconventional electronic properties that make them attractive for both basic science and next-generation technological applications. The majority of currently-known topological materials have been discovered using methods that involve symmetry-based analysis of the quantum wavefunction. Here we use machine learning to develop a simple-to-use heuristic chemical rule that diagnoses with a high accuracy whether a material is topological using only its chemical formula. This heuristic rule is based on a notion that we term topogivity, a machine-learned numerical value for each element that loosely captures its tendency to form topological materials. We next implement a high-throughput strategy for discovering topological materials based on the heuristic topogivity-rule prediction followed by ab initio validation. This way, we discover new topological materials that are not diagnosable using symmetry indicators, including several that may be promising for experimental observation.