State Key Labratory of Traction Power, Southwest Jiaotong University, Chengdu, China
Abstract:Diffusion Probabilistic Models (DPMs) have emerged as the de facto approach for high-fidelity image synthesis, operating diffusion processes on continuous VAE latent, which significantly differ from the text generation methods employed by Large Language Models (LLMs). In this paper, we introduce a novel generative framework, the Recurrent Diffusion Probabilistic Model (RDPM), which enhances the diffusion process through a recurrent token prediction mechanism, thereby pioneering the field of Discrete Diffusion. By progressively introducing Gaussian noise into the latent representations of images and encoding them into vector-quantized tokens in a recurrent manner, RDPM facilitates a unique diffusion process on discrete-value domains. This process iteratively predicts the token codes for subsequent timesteps, transforming the initial standard Gaussian noise into the source data distribution, aligning with GPT-style models in terms of the loss function. RDPM demonstrates superior performance while benefiting from the speed advantage of requiring only a few inference steps. This model not only leverages the diffusion process to ensure high-quality generation but also converts continuous signals into a series of high-fidelity discrete tokens, thereby maintaining a unified optimization strategy with other discrete tokens, such as text. We anticipate that this work will contribute to the development of a unified model for multimodal generation, specifically by integrating continuous signal domains such as images, videos, and audio with text. We will release the code and model weights to the open-source community.
Abstract:With the rapid development of artificial intelligence, the combination of material database and machine learning has driven the progress of material informatics. Because aluminum alloy is widely used in many fields, so it is significant to predict the properties of aluminum alloy. In this thesis, the data of Al-Cu-Mg-X (X: Zn, Zr, etc.) alloy are used to input the composition, aging conditions (time and temperature) and predict its hardness. An ensemble learning solution based on automatic machine learning and an attention mechanism introduced into the secondary learner of deep neural network are proposed respectively. The experimental results show that selecting the correct secondary learner can further improve the prediction accuracy of the model. This manuscript introduces the attention mechanism to improve the secondary learner based on deep neural network, and obtains a fusion model with better performance. The R-Square of the best model is 0.9697 and the MAE is 3.4518HV.