Abstract:This work aims to address the multi-view perspective RGB generation from text prompts given Bird-Eye-View(BEV) semantics. Unlike prior methods that neglect layout consistency, lack the ability to handle detailed text prompts, or are incapable of generalizing to unseen view points, MVPbev simultaneously generates cross-view consistent images of different perspective views with a two-stage design, allowing object-level control and novel view generation at test-time. Specifically, MVPbev firstly projects given BEV semantics to perspective view with camera parameters, empowering the model to generalize to unseen view points. Then we introduce a multi-view attention module where special initialization and de-noising processes are introduced to explicitly enforce local consistency among overlapping views w.r.t. cross-view homography. Last but not least, MVPbev further allows test-time instance-level controllability by refining a pre-trained text-to-image diffusion model. Our extensive experiments on NuScenes demonstrate that our method is capable of generating high-resolution photorealistic images from text descriptions with thousands of training samples, surpassing the state-of-the-art methods under various evaluation metrics. We further demonstrate the advances of our method in terms of generalizability and controllability with the help of novel evaluation metrics and comprehensive human analysis. Our code, data, and model can be found in \url{https://github.com/kkaiwwana/MVPbev}.
Abstract:\begin{abstract} In recent years, the Finger Texture (FT) has attracted considerable attention as a biometric characteristic. It can provide efficient human recognition performance, because it has different human-specific features of apparent lines, wrinkles and ridges distributed along the inner surface of all fingers. Also, such pattern structures are reliable, unique and remain stable throughout a human's life. Efficient biometric systems can be established based only on FTs. In this paper, a comprehensive survey of the relevant FT studies is presented. We also summarise the main drawbacks and obstacles of employing the FT as a biometric characteristic, and provide useful suggestions to further improve the work on FT. \end{abstract}