Abstract:We present the Unit Region Encoding of floorplans, which is a unified and compact geometry-aware encoding representation for various applications, ranging from interior space planning, floorplan metric learning to floorplan generation tasks. The floorplans are represented as the latent encodings on a set of boundary-adaptive unit region partition based on the clustering of the proposed geometry-aware density map. The latent encodings are extracted by a trained network (URE-Net) from the input dense density map and other available semantic maps. Compared to the over-segmented rasterized images and the room-level graph structures, our representation can be flexibly adapted to different applications with the sliced unit regions while achieving higher accuracy performance and better visual quality. We conduct a variety of experiments and compare to the state-of-the-art methods on the aforementioned applications to validate the superiority of our representation, as well as extensive ablation studies to demonstrate the effect of our slicing choices.
Abstract:Recently, text-to-image models based on diffusion have achieved remarkable success in generating high-quality images. However, the challenge of personalized, controllable generation of instances within these images remains an area in need of further development. In this paper, we present LocRef-Diffusion, a novel, tuning-free model capable of personalized customization of multiple instances' appearance and position within an image. To enhance the precision of instance placement, we introduce a Layout-net, which controls instance generation locations by leveraging both explicit instance layout information and an instance region cross-attention module. To improve the appearance fidelity to reference images, we employ an appearance-net that extracts instance appearance features and integrates them into the diffusion model through cross-attention mechanisms. We conducted extensive experiments on the COCO and OpenImages datasets, and the results demonstrate that our proposed method achieves state-of-the-art performance in layout and appearance guided generation.