Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Difformer: Empowering Diffusion Model on Embedding Space for Text Generation

Dec 19, 2022

Zhujin Gao, Junliang Guo, Xu Tan, Yongxin Zhu, Fang Zhang, Jiang Bian, Linli Xu

Figure 1 for Difformer: Empowering Diffusion Model on Embedding Space for Text Generation

Figure 2 for Difformer: Empowering Diffusion Model on Embedding Space for Text Generation

Figure 3 for Difformer: Empowering Diffusion Model on Embedding Space for Text Generation

Figure 4 for Difformer: Empowering Diffusion Model on Embedding Space for Text Generation

Share this with someone who'll enjoy it:

Abstract:Diffusion models have achieved state-of-the-art synthesis quality on visual and audio tasks, and recent works adapt them to textual data by diffusing on the embedding space. But the difference between the continuous data space and the embedding space raises challenges to the diffusion model, which have not been carefully explored. In this paper, we conduct systematic studies and analyze the challenges threefold. Firstly, the data distribution is learnable for embeddings, which may lead to the collapse of the loss function. Secondly, as the norm of embedding varies between popular and rare words, adding the same noise scale will lead to sub-optimal results. In addition, we find that noises sampled from a standard Gaussian distribution may distract the diffusion process. To solve the above challenges, we propose Difformer, a denoising diffusion probabilistic model based on Transformer, which consists of three techniques including utilizing an anchor loss function, a layer normalization module for embeddings, and a norm factor to the Gaussian noise. All techniques are complementary to each other and critical to boosting the model performance together. Experiments are conducted on benchmark datasets over two seminal text generation tasks including machine translation and text summarization. The results show that Difformer significantly outperforms the embedding diffusion baselines, while achieving competitive results with strong autoregressive baselines.

* Work in progress

View paper on

Share this with someone who'll enjoy it:

Title:Difformer: Empowering Diffusion Model on Embedding Space for Text Generation

Paper and Code