Abstract:Diffusion models have shown their remarkable ability to synthesize images, including the generation of humans in specific poses. However, current models face challenges in adequately expressing conditional control for detailed hand pose generation, leading to significant distortion in the hand regions. To tackle this problem, we first curate the How2Sign dataset to provide richer and more accurate hand pose annotations. In addition, we introduce adaptive, multi-modal fusion to integrate characters' physical features expressed in different modalities such as skeleton, depth, and surface normal. Furthermore, we propose a novel Region-Aware Cycle Loss (RACL) that enables the diffusion model training to focus on improving the hand region, resulting in improved quality of generated hand gestures. More specifically, the proposed RACL computes a weighted keypoint distance between the full-body pose keypoints from the generated image and the ground truth, to generate higher-quality hand poses while balancing overall pose accuracy. Moreover, we use two hand region metrics, named hand-PSNR and hand-Distance for hand pose generation evaluations. Our experimental evaluations demonstrate the effectiveness of our proposed approach in improving the quality of digital human pose generation using diffusion models, especially the quality of the hand region. The source code is available at https://github.com/fuqifan/Region-Aware-Cycle-Loss.
Abstract:This paper presents three open-source reinforcement learning environments developed on the MuJoCo physics engine with the Franka Emika Panda arm in MuJoCo Menagerie. Three representative tasks, push, slide, and pick-and-place, are implemented through the Gymnasium Robotics API, which inherits from the core of Gymnasium. Both the sparse binary and dense rewards are supported, and the observation space contains the keys of desired and achieved goals to follow the Multi-Goal Reinforcement Learning framework. Three different off-policy algorithms are used to validate the simulation attributes to ensure the fidelity of all tasks, and benchmark results are also given. Each environment and task are defined in a clean way, and the main parameters for modifying the environment are preserved to reflect the main difference. The repository, including all environments, is available at https://github.com/zichunxx/panda_mujoco_gym.
Abstract:Modular design is the foundation of on orbit construction technology of large space facilities in the future.Standard interface is the key technology of modular design of the future space robotic systems and space facilities.This paper presents the designed and tested of PetLock,a standard and genderless interface which can transfer mechanical loads,power and data between the future modular space robotic manipulator and spacecraft.PetLock adopts a completely genderless design,including connection face,locking mechanism,data and power interface.The connection surface provides a large translation and rotation misalignment tolerance,due to its 120-degree symmetrical and 3D shape design.The locking mechanism features the three locking pins retraction structure design,which is simple and reliable.POGO pin connectors in the center of the interface provides the power and data transfer capabilities.Due to the advantages of high locking force,large tolerance,high reliability and low cost,PetLock has the very big application potential in future on orbit construction missions.
Abstract:Forward and backward reaching inverse kinematics (FABRIK) is a heuristic inverse kinematics solver that is gradually applied to manipulators with the advantages of fast convergence and generating more realistic configurations. However, under the high error constraint, FABRIK exhibits unstable convergence behavior, which is unsatisfactory for the real-time motion planning of manipulators. In this paper, a novel inverse kinematics algorithm that combines FABRIK and the sequential quadratic programming (SQP) algorithm is presented, in which the joint angles deduced by FABRIK will be taken as the initial seed of the SQP algorithm to avoid getting stuck in local minima. The combined algorithm is evaluated with experiments, in which our algorithm can achieve higher success rates and faster solution times than FABRIK under the high error constraint. Furthermore, the combined algorithm can generate continuous trajectories for the UR5 and KUKA LBR IIWA 14 R820 manipulators in path tracking with no pose error and permitted position error of the end-effector.
Abstract:Learning reliable motion representation between consecutive frames, such as optical flow, has proven to have great promotion to video understanding. However, the TV-L1 method, an effective optical flow solver, is time-consuming and expensive in storage for caching the extracted optical flow. To fill the gap, we propose UF-TSN, a novel end-to-end action recognition approach enhanced with an embedded lightweight unsupervised optical flow estimator. UF-TSN estimates motion cues from adjacent frames in a coarse-to-fine manner and focuses on small displacement for each level by extracting pyramid of feature and warping one to the other according to the estimated flow of the last level. Due to the lack of labeled motion for action datasets, we constrain the flow prediction with multi-scale photometric consistency and edge-aware smoothness. Compared with state-of-the-art unsupervised motion representation learning methods, our model achieves better accuracy while maintaining efficiency, which is competitive with some supervised or more complicated approaches.
Abstract:Optical flow estimation is an essential step for many real-world computer vision tasks. Existing deep networks have achieved satisfactory results by mostly employing a pyramidal coarse-to-fine paradigm, where a key process is to adopt warped target feature based on previous flow prediction to correlate with source feature for building 3D matching cost volume. However, the warping operation can lead to troublesome ghosting problem that results in ambiguity. Moreover, occluded areas are treated equally with non occluded regions in most existing works, which may cause performance degradation. To deal with these challenges, we propose a lightweight yet efficient optical flow network, named OAS-Net (occlusion aware sampling network) for accurate optical flow. First, a new sampling based correlation layer is employed without noisy warping operation. Second, a novel occlusion aware module is presented to make raw cost volume conscious of occluded regions. Third, a shared flow and occlusion awareness decoder is adopted for structure compactness. Experiments on Sintel and KITTI datasets demonstrate the effectiveness of proposed approaches.