Abstract:Generative text-to-image models have gained great popularity among the public for their powerful capability to generate high-quality images based on natural language prompts. However, developing effective prompts for desired images can be challenging due to the complexity and ambiguity of natural language. This research proposes PromptMagician, a visual analysis system that helps users explore the image results and refine the input prompts. The backbone of our system is a prompt recommendation model that takes user prompts as input, retrieves similar prompt-image pairs from DiffusionDB, and identifies special (important and relevant) prompt keywords. To facilitate interactive prompt refinement, PromptMagician introduces a multi-level visualization for the cross-modal embedding of the retrieved images and recommended keywords, and supports users in specifying multiple criteria for personalized exploration. Two usage scenarios, a user study, and expert interviews demonstrate the effectiveness and usability of our system, suggesting it facilitates prompt engineering and improves the creativity support of the generative text-to-image model.
Abstract:This thesis analyzes the challenging problem of Image Deblurring based on classical theorems and state-of-art methods proposed in recent years. By spectral analysis we mathematically show the effective of spectral regularization methods, and point out the linking between the spectral filtering result and the solution of the regularization optimization objective. For ill-posed problems like image deblurring, the optimization objective contains a regularization term (also called the regularization functional) that encodes our prior knowledge into the solution. We demonstrate how to craft a regularization term by hand using the idea of maximum a posterior estimation. Then, we point out the limitations of such regularization-based methods, and step into the neural-network based methods. Based on the idea of Wasserstein generative adversarial models, we can train a CNN to learn the regularization functional. Such data-driven approaches are able to capture the complexity, which may not be analytically modellable. Besides, in recent years with the improvement of architectures, the network has been able to output an image closely approximating the ground truth given the blurry observation. The Generative Adversarial Network (GAN) works on this Image-to-Image translation idea. We analyze the DeblurGAN-v2 method proposed by Orest Kupyn et al. [14] in 2019 based on numerical tests. And, based on the experimental results and our knowledge, we put forward some suggestions for improvement on this method.