Abstract:Topological insulators (TIs) and topological crystalline insulators (TCIs) are materials with unconventional electronic properties, making their discovery highly valuable for practical applications. However, such materials, particularly those with a full band gap, remain scarce. Given the limitations of traditional approaches that scan known materials for candidates, we focus on the generation of new topological materials through a generative model. Specifically, we apply reinforcement fine-tuning (ReFT) to a pre-trained generative model, thereby aligning the model's objectives with our material design goals. We demonstrate that ReFT is effective in enhancing the model's ability to generate TIs and TCIs, with minimal compromise on the stability of the generated materials. Using the fine-tuned model, we successfully identify a large number of new topological materials, with Ge$_2$Bi$_2$O$_6$ serving as a representative example--a TI with a full band gap of 0.26 eV, ranking among the largest known in this category.
Abstract:The use of machine learning methods for predicting the properties of crystalline materials encounters significant challenges, primarily related to input encoding, output versatility, and interpretability. Here, we introduce CrystalBERT, an adaptable transformer-based framework with novel structure that integrates space group, elemental, and unit cell information. The method's adaptability lies not only in its ability to seamlessly combine diverse features but also in its capability to accurately predict a wide range of physically important properties, including topological properties, superconducting transition temperatures, dielectric constants, and more. CrystalBERT also provides insightful physical interpretations regarding the features that most significantly influence the target properties. Our findings indicate that space group and elemental information are more important for predicting topological and superconducting properties, in contrast to some properties that primarily depend on the unit cell information. This underscores the intricate nature of topological and superconducting properties. By incorporating all these features, we achieve a high accuracy of 91% in topological classification, surpassing prior studies and identifying previously misclassified topological materials, further demonstrating the effectiveness of our model.