Abstract:Contactless fingerprint is a newly developed type of fingerprint, and has gained lots of attention in recent fingerprint studies. However, most existing contactless fingerprint algorithms treat contactless fingerprints as 2D plain fingerprints, and utilize similar recognition methods as traditional contact-based 2D fingerprints. This recognition approach does not consider the modality difference between contactless and contact fingerprints, especially the intrinsic 3D characteristic of contactless fingerprints. This paper proposes a novel contactless fingerprint recognition algorithm that captures the revealed 3D feature of contactless fingerprints rather than the plain 2D feature. The proposed method first recovers 3D features from the input contactless fingerprint, including the 3D shape model and 3D fingerprint feature (minutiae, orientation, etc.). Then, a novel 3D graph matching is conducted in 3D space according to the extracted 3D feature. Our method captures the real 3D nature of contactless fingerprints as the whole feature extraction and matching algorithms are completed in real 3D space. Experiments results on contactless fingerprint databases show that the proposed method successfully improves the matching accuracy of contactless fingerprints. Exceptionally, our method performs stably across multiple poses of contactless fingerprints due to 3D graph matching, which is a great advantage compared to previous contactless fingerprint recognition algorithms.
Abstract:This paper investigates a wireless network consisting of an unmanned aerial vehicle (UAV) base station (BS), a fully-connected reconfigurable intelligent surface (FC-RIS), and multiple users, where the downlink signal can simultaneously be captured by an aerial eavesdropper at a random location. To improve the physical-layer security (PLS) of the considered downlink multiuser communications, we propose the fully-connected reconfigurable intelligent surface aided round-robin scheduling (FCR-RS) and the FC-RIS and ground channel state information (CSI) aided proportional fair scheduling (FCR-GCSI-PFS) schemes. Thereafter, we derive closed-form expressions of the zero secrecy rate probability (ZSRP). Numerical results not only validate the closed-form ZSRP analysis, but also verify that the proposed GCSI-PFS scheme obtains the same performance gain as the full-CSI-aided PFS in FC-RIS-aided communications. Furthermore, optimizing the hovering altitude remarkably enhances the PLS of the FC-RIS and UAV empowered multiuser communications.
Abstract:The new trend in multi-object tracking task is to track objects of interest using natural language. However, the scarcity of paired prompt-instance data hinders its progress. To address this challenge, we propose a high-quality yet low-cost data generation method base on Unreal Engine 5 and construct a brand-new benchmark dataset, named Refer-UE-City, which primarily includes scenes from intersection surveillance videos, detailing the appearance and actions of people and vehicles. Specifically, it provides 14 videos with a total of 714 expressions, and is comparable in scale to the Refer-KITTI dataset. Additionally, we propose a multi-level semantic-guided multi-object framework called MLS-Track, where the interaction between the model and text is enhanced layer by layer through the introduction of Semantic Guidance Module (SGM) and Semantic Correlation Branch (SCB). Extensive experiments on Refer-UE-City and Refer-KITTI datasets demonstrate the effectiveness of our proposed framework and it achieves state-of-the-art performance. Code and datatsets will be available.
Abstract:Automated Valet Parking (AVP) requires precise localization in challenging garage conditions, including poor lighting, sparse textures, repetitive structures, dynamic scenes, and the absence of Global Positioning System (GPS) signals, which often pose problems for conventional localization methods. To address these adversities, we present AVM-SLAM, a semantic visual SLAM framework with multi-sensor fusion in a Bird's Eye View (BEV). Our framework integrates four fisheye cameras, four wheel encoders, and an Inertial Measurement Unit (IMU). The fisheye cameras form an Around View Monitor (AVM) subsystem, generating BEV images. Convolutional Neural Networks (CNNs) extract semantic features from these images, aiding in mapping and localization tasks. These semantic features provide long-term stability and perspective invariance, effectively mitigating environmental challenges. Additionally, data fusion from wheel encoders and IMU enhances system robustness by improving motion estimation and reducing drift. To validate AVM-SLAM's efficacy and robustness, we provide a large-scale, high-resolution underground garage dataset, available at https://github.com/yale-cv/avm-slam. This dataset enables researchers to further explore and assess AVM-SLAM in similar environments.
Abstract:Compared with contact-based fingerprint acquisition techniques, contactless acquisition has the advantages of less skin distortion, larger fingerprint area, and hygienic acquisition. However, perspective distortion is a challenge in contactless fingerprint recognition, which changes ridge orientation, frequency, and minutiae location, and thus causes degraded recognition accuracy. We propose a learning based shape from texture algorithm to reconstruct a 3D finger shape from a single image and unwarp the raw image to suppress perspective distortion. Experimental results on contactless fingerprint databases show that the proposed method has high 3D reconstruction accuracy. Matching experiments on contactless-contact and contactless-contactless matching prove that the proposed method improves matching accuracy.
Abstract:Few-shot semantic segmentation aims at recognizing the object regions of unseen categories with only a few annotated examples as supervision. The key to few-shot segmentation is to establish a robust semantic relationship between the support and query images and to prevent overfitting. In this paper, we propose an effective Multi-similarity Hyperrelation Network (MSHNet) to tackle the few-shot semantic segmentation problem. In MSHNet, we propose a new Generative Prototype Similarity (GPS), which together with cosine similarity can establish a strong semantic relation between the support and query images. The locally generated prototype similarity based on global feature is logically complementary to the global cosine similarity based on local feature, and the relationship between the query image and the supported image can be expressed more comprehensively by using the two similarities simultaneously. In addition, we propose a Symmetric Merging Block (SMB) in MSHNet to efficiently merge multi-layer, multi-shot and multi-similarity hyperrelational features. MSHNet is built on the basis of similarity rather than specific category features, which can achieve more general unity and effectively reduce overfitting. On two benchmark semantic segmentation datasets Pascal-5i and COCO-20i, MSHNet achieves new state-of-the-art performances on 1-shot and 5-shot semantic segmentation tasks.
Abstract:Dense registration of fingerprints is a challenging task due to elastic skin distortion, low image quality, and self-similarity of ridge pattern. To overcome the limitation of handcraft features, we propose to train an end-to-end network to directly output pixel-wise displacement field between two fingerprints. The proposed network includes a siamese network for feature embedding, and a following encoder-decoder network for regressing displacement field. By applying displacement fields reliably estimated by tracing high quality fingerprint videos to challenging fingerprints, we synthesize a large number of training fingerprint pairs with ground truth displacement fields. In addition, based on the proposed registration algorithm, we propose a fingerprint mosaicking method based on optimal seam selection. Registration and matching experiments on FVC2004 databases, Tsinghua Distorted Fingerprint (TDF) database, and NIST SD27 latent fingerprint database show that our registration method outperforms previous dense registration methods in accuracy and efficiency. Mosaicking experiment on FVC2004 DB1 demonstrates that the proposed algorithm produced higher quality fingerprints than other algorithms which also validates the performance of our registration algorithm.