Abstract:Reconstructing the hand mesh from one single RGB image is a challenging task because hands are often occluded by other objects. Most previous works attempt to explore more additional information and adopt attention mechanisms for improving 3D reconstruction performance, while it would increase computational complexity simultaneously. To achieve a performance-reserving architecture with high computational efficiency, in this work, we propose a simple but effective 3D hand mesh reconstruction network (i.e., HandS3C), which is the first time to incorporate state space model into the task of hand mesh reconstruction. In the network, we design a novel state-space spatial-channel attention module that extends the effective receptive field, extracts hand features in the spatial dimension, and enhances regional features of hands in the channel dimension. This helps to reconstruct a complete and detailed hand mesh. Extensive experiments conducted on well-known datasets facing heavy occlusions (such as FREIHAND, DEXYCB, and HO3D) demonstrate that our proposed HandS3C achieves state-of-the-art performance while maintaining a minimal parameters.
Abstract:Reconstructing a hand mesh from a single RGB image is a challenging task because hands are often occluded by objects. Most previous works attempted to introduce more additional information and adopt attention mechanisms to improve 3D reconstruction results, but it would increased computational complexity. This observation prompts us to propose a new and concise architecture while improving computational efficiency. In this work, we propose a simple and effective 3D hand mesh reconstruction network HandSSCA, which is the first to incorporate state space modeling into the field of hand pose estimation. In the network, we have designed a novel state space channel attention module that extends the effective sensory field, extracts hand features in the spatial dimension, and enhances hand regional features in the channel dimension. This design helps to reconstruct a complete and detailed hand mesh. Extensive experiments conducted on well-known datasets featuring challenging hand-object occlusions (such as FREIHAND, DEXYCB, and HO3D) demonstrate that our proposed HandSSCA achieves state-of-the-art performance while maintaining a minimal parameter count.
Abstract:Modern object detectors take advantage of rectangular bounding boxes as a conventional way to represent objects. When it comes to fisheye images, rectangular boxes involve more background noise rather than semantic information. Although multi-point representation has been proposed, both the regression accuracy and convergence still perform inferior to the widely used rectangular boxes. In order to further exploit the advantages of multi-point representation for distorted images, Concentric Rectangles Regression Strategy(CRRS) is proposed in this work. We adopt smoother mean loss to allocate weights and discuss the effect of hyper-parameter to prediction results. Moreover, an accurate pixel-level method is designed to obtain irregular IoU for estimating detector performance. Compared with the previous work for muti-point representation, the experiments show that CRRS can improve the training performance both in accurate and stability. We also prove that multi-task weighting strategy facilitates regression process in this design.