Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rami Skaik

MCGM: Mask Conditional Text-to-Image Generative Model

Oct 01, 2024

Rami Skaik, Leonardo Rossi, Tomaso Fontanini, Andrea Prati

Figure 1 for MCGM: Mask Conditional Text-to-Image Generative Model

Figure 2 for MCGM: Mask Conditional Text-to-Image Generative Model

Figure 3 for MCGM: Mask Conditional Text-to-Image Generative Model

Abstract:Recent advancements in generative models have revolutionized the field of artificial intelligence, enabling the creation of highly-realistic and detailed images. In this study, we propose a novel Mask Conditional Text-to-Image Generative Model (MCGM) that leverages the power of conditional diffusion models to generate pictures with specific poses. Our model builds upon the success of the Break-a-scene [1] model in generating new scenes using a single image with multiple subjects and incorporates a mask embedding injection that allows the conditioning of the generation process. By introducing this additional level of control, MCGM offers a flexible and intuitive approach for generating specific poses for one or more subjects learned from a single image, empowering users to influence the output based on their requirements. Through extensive experimentation and evaluation, we demonstrate the effectiveness of our proposed model in generating high-quality images that meet predefined mask conditions and improving the current Break-a-scene generative model.

* 17 pages, 13 figures, presented at the 5th International Conference on Artificial Intelligence and Machine Learning (CAIML 2024)

Via

Access Paper or Ask Questions