Abstract:With the rapid evolution of 3D generation algorithms, the cost of producing 3D humanoid character models has plummeted, yet the field is impeded by the lack of a comprehensive dataset for automatic rigging, which is a pivotal step in character animation. Addressing this gap, we present HumanRig, the first large-scale dataset specifically designed for 3D humanoid character rigging, encompassing 11,434 meticulously curated T-posed meshes adhered to a uniform skeleton topology. Capitalizing on this dataset, we introduce an innovative, data-driven automatic rigging framework, which overcomes the limitations of GNN-based methods in handling complex AI-generated meshes. Our approach integrates a Prior-Guided Skeleton Estimator (PGSE) module, which uses 2D skeleton joints to provide a preliminary 3D skeleton, and a Mesh-Skeleton Mutual Attention Network (MSMAN) that fuses skeleton features with 3D mesh features extracted by a U-shaped point transformer. This enables a coarse-to-fine 3D skeleton joint regression and a robust skinning estimation, surpassing previous methods in quality and versatility. This work not only remedies the dataset deficiency in rigging research but also propels the animation industry towards more efficient and automated character rigging pipelines.
Abstract:Dam reservoirs play an important role in meeting sustainable development goals and global climate targets. However, particularly for small dam reservoirs, there is a lack of consistent data on their geographical location. To address this data gap, a promising approach is to perform automated dam reservoir extraction based on globally available remote sensing imagery. It can be considered as a fine-grained task of water body extraction, which involves extracting water areas in images and then separating dam reservoirs from natural water bodies. We propose a novel deep neural network (DNN) based pipeline that decomposes dam reservoir extraction into water body segmentation and dam reservoir recognition. Water bodies are firstly separated from background lands in a segmentation model and each individual water body is then predicted as either dam reservoir or natural water body in a classification model. For the former step, point-level metric learning with triplets across images is injected into the segmentation model to address contour ambiguities between water areas and land regions. For the latter step, prior-guided metric learning with triplets from clusters is injected into the classification model to optimize the image embedding space in a fine-grained level based on reservoir clusters. To facilitate future research, we establish a benchmark dataset with earth imagery data and human labelled reservoirs from river basins in West Africa and India. Extensive experiments were conducted on this benchmark in the water body segmentation task, dam reservoir recognition task, and the joint dam reservoir extraction task. Superior performance has been observed in the respective tasks when comparing our method with state of the art approaches.
Abstract:Gait recognition is one of the most important biometric technologies and has been applied in many fields. Recent gait recognition frameworks represent each human gait frame by descriptors extracted from either global appearances or local regions of humans. However, the representations based on global information often neglect the details of the gait frame, while local region based descriptors cannot capture the relations among neighboring regions, thus reducing their discriminativeness. In this paper, we propose a novel feature extraction and fusion framework to achieve discriminative feature representations for gait recognition. Towards this goal, we take advantage of both global visual information and local region details and develop a Global and Local Feature Extractor (GLFE). Specifically, our GLFE module is composed of our newly designed multiple global and local convolutional layers (GLConv) to ensemble global and local features in a principle manner. Furthermore, we present a novel operation, namely Local Temporal Aggregation (LTA), to further preserve the spatial information by reducing the temporal resolution to obtain higher spatial resolution. With the help of our GLFE and LTA, our method significantly improves the discriminativeness of our visual features, thus improving the gait recognition performance. Extensive experiments demonstrate that our proposed method outperforms state-of-the-art gait recognition methods on popular widely-used CASIA-B and OUMVLP datasets.