Abstract:Recently, text-to-image models based on diffusion have achieved remarkable success in generating high-quality images. However, the challenge of personalized, controllable generation of instances within these images remains an area in need of further development. In this paper, we present LocRef-Diffusion, a novel, tuning-free model capable of personalized customization of multiple instances' appearance and position within an image. To enhance the precision of instance placement, we introduce a Layout-net, which controls instance generation locations by leveraging both explicit instance layout information and an instance region cross-attention module. To improve the appearance fidelity to reference images, we employ an appearance-net that extracts instance appearance features and integrates them into the diffusion model through cross-attention mechanisms. We conducted extensive experiments on the COCO and OpenImages datasets, and the results demonstrate that our proposed method achieves state-of-the-art performance in layout and appearance guided generation.
Abstract:Entity extraction is a key technology for obtaining information from massive texts in natural language processing. The further interaction between them does not meet the standards of human reading comprehension, thus limiting the understanding of the model, and also the omission or misjudgment of the answer (ie the target entity) due to the reasoning question. An effective MRC-based entity extraction model-MRC-I2DP, which uses the proposed gated attention-attracting mechanism to adjust the restoration of each part of the text pair, creating problems and thinking for multi-level interactive attention calculations to increase the target entity It also uses the proposed 2D probability coding module, TALU function and mask mechanism to strengthen the detection of all possible targets of the target, thereby improving the probability and accuracy of prediction. Experiments have proved that MRC-I2DP represents an overall state-of-the-art model in 7 from the scientific and public domains, achieving a performance improvement of up to compared to the model model in F1.