Abstract:The field of computer-aided synthesis planning (CASP) has seen rapid advancements in recent years, achieving significant progress across various algorithmic benchmarks. However, chemists often encounter numerous infeasible reactions when using CASP in practice. This article delves into common errors associated with CASP and introduces a product prediction model aimed at enhancing the accuracy of single-step models. While the product prediction model reduces the number of single-step reactions, it integrates multiple single-step models to maintain the overall reaction count and increase reaction diversity. Based on manual analysis and large-scale testing, the product prediction model, combined with the multi-model ensemble approach, has been proven to offer higher feasibility and greater diversity.
Abstract:Single-step retrosynthesis is a crucial task in organic chemistry and drug design, requiring the identification of required reactants to synthesize a specific compound. with the advent of computer-aided synthesis planning, there is growing interest in using machine-learning techniques to facilitate the process. Existing template-free machine learning-based models typically utilize transformer structures and represent molecules as ID sequences. However, these methods often face challenges in fully leveraging the extensive topological information of the molecule and aligning atoms between the production and reactants, leading to results that are not as competitive as those of semi-template models. Our proposed method, Node-Aligned Graph-to-Graph (NAG2G), also serves as a transformer-based template-free model but utilizes 2D molecular graphs and 3D conformation information. Furthermore, our approach simplifies the incorporation of production-reactant atom mapping alignment by leveraging node alignment to determine a specific order for node generation and generating molecular graphs in an auto-regressive manner node-by-node. This method ensures that the node generation order coincides with the node order in the input graph, overcoming the difficulty of determining a specific node generation order in an auto-regressive manner. Our extensive benchmarking results demonstrate that the proposed NAG2G can outperform the previous state-of-the-art baselines in various metrics.
Abstract:The recurrence rebuild and retrieval method (R3M) is proposed in this paper to accelerate the electromagnetic (EM) validations of large-scale digital coding metasurfaces (DCMs). R3M aims to accelerate the EM validations of DCMs with varied codebooks, which involves the analysis of a group of similar but distinct coding patterns. The method transforms general DCMs to rigorously periodic arrays by replacing each coding unit with the macro unit, which comprises all possible coding states. The system matrix corresponding to the rigorously periodic array is globally shared for DCMs with arbitrary codebooks via implicit retrieval. The discrepancy of the interactions for edge and corner units are precluded by the basis extension of periodic boundaries. Moreover, the hierarchical pattern exploitation algorithm is leveraged to efficiently assemble the system matrix for further acceleration. Due to the fully utilization of the rigid periodicity, the computational complexity of R3M is theoretically lower than that of $\mathcal{H}$-matrix within the same paradigm. Numerical results for two types of DCMs indicate that R3M is accurate in comparison with commercial software. Besides, R3M is also compatible with the preconditioning for efficient iterative solutions. The efficiency of R3M for DCMs outperforms the conventional fast algorithms by a large margin in both the storage and CPU time cost.