Abstract:To date, the International Zeolite Association Structure Commission (IZA-SC) has cataloged merely 255 distinct zeolite structures, with millions of theoretically possible structures yet to be discovered. The synthesis of a specific zeolite typically necessitates the use of an organic structure-directing agent (OSDA), since the selectivity for a particular zeolite is largely determined by the affinity between the OSDA and the zeolite. Therefore, finding the best affinity OSDA-zeolite pair is the key to the synthesis of targeted zeolite. However, OSDA-zeolite pairs frequently exhibit complex geometric structures, i.e., a complex crystal structure formed by a large number of atoms. Although some existing machine learning methods can represent the periodicity of crystals, they cannot accurately represent crystal structures with local variability. To address this issue, we propose a novel approach called Zeoformer, which can effectively represent coarse-grained crystal periodicity and fine-grained local variability. Zeoformer reconstructs the unit cell centered around each atom and encodes the pairwise distances between this central atom and other atoms within the reconstructed unit cell. The introduction of pairwise distances within the reconstructed unit cell more effectively represents the overall structure of the unit cell and the differences between different unit cells, enabling the model to more accurately and efficiently predict the properties of OSDA-zeolite pairs and general crystal structures. Through comprehensive evaluation, our Zeoformer model demonstrates the best performance on OSDA-zeolite pair datasets and two types of crystal material datasets.
Abstract:Recently, diffusion models have achieved remarkable performance in data generation, e.g., generating high-quality images. Nevertheless, chemistry molecules often have complex non-Euclidean spatial structures, with the behavior changing dynamically and unpredictably. Most existing diffusion models highly rely on computing the probability distribution, i.e., Gaussian distribution, in Euclidean space, which cannot capture internal non-Euclidean structures of molecules, especially the hierarchical structures of the implicit manifold surface represented by molecules. It has been observed that the complex hierarchical structures in hyperbolic embedding space become more prominent and easier to be captured. In order to leverage both the data generation power of diffusion models and the strong capability to extract complex geometric features of hyperbolic embedding, we propose to extend the diffusion model to hyperbolic manifolds for molecule generation, namely, Hyperbolic Graph Diffusion Model (HGDM). The proposed HGDM employs a hyperbolic variational autoencoder to generate the hyperbolic hidden representation of nodes and then a score-based hyperbolic graph neural network is used to learn the distribution in hyperbolic space. Numerical experimental results show that the proposed HGDM achieves higher performance on several molecular datasets, compared with state-of-the-art methods.