Abstract:The intersection of deep learning and symbolic mathematics has seen rapid progress in recent years, exemplified by the work of Lample and Charton. They demonstrated that effective training of machine learning models for solving mathematical problems critically depends on high-quality, domain-specific datasets. In this paper, we address the computation of Gr\"obner basis using Transformers. While a dataset generation method tailored to Transformer-based Gr\"obner basis computation has previously been proposed, it lacked theoretical guarantees regarding the generality or quality of the generated datasets. In this work, we prove that datasets generated by the previously proposed algorithm are sufficiently general, enabling one to ensure that Transformers can learn a sufficiently diverse range of Gr\"obner bases. Moreover, we propose an extended and generalized algorithm to systematically construct datasets of ideal generators, further enhancing the training effectiveness of Transformer. Our results provide a rigorous geometric foundation for Transformers to address a mathematical problem, which is an answer to Lample and Charton's idea of training on diverse or representative inputs.
Abstract:Solving a polynomial system, or computing an associated Gr\"obner basis, has been a fundamental task in computational algebra. However, it is also known for its notoriously expensive computational cost -- doubly exponential time complexity in the number of variables in the worst case. In this paper, we achieve for the first time Gr\"obner basis computation through the training of a transformer. The training requires many pairs of a polynomial system and the associated Gr\"obner basis, thus motivating us to address two novel algebraic problems: random generation of Gr\"obner bases and the transformation of them into non-Gr\"obner polynomial systems, termed as \textit{backward Gr\"obner problem}. We resolve these problems with zero-dimensional radical ideals, the ideals appearing in various applications. The experiments show that in the five-variate case, the proposed dataset generation method is five orders of magnitude faster than a naive approach, overcoming a crucial challenge in learning to compute Gr\"obner bases.