Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kejie Qiu

LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds

Mar 13, 2025

Lingteng Qiu, Xiaodong Gu, Peihao Li, Qi Zuo, Weichao Shen, Junfei Zhang, Kejie Qiu, Weihao Yuan, Guanying Chen, Zilong Dong(+1 more)

Abstract:Animatable 3D human reconstruction from a single image is a challenging problem due to the ambiguity in decoupling geometry, appearance, and deformation. Recent advances in 3D human reconstruction mainly focus on static human modeling, and the reliance of using synthetic 3D scans for training limits their generalization ability. Conversely, optimization-based video methods achieve higher fidelity but demand controlled capture conditions and computationally intensive refinement processes. Motivated by the emergence of large reconstruction models for efficient static reconstruction, we propose LHM (Large Animatable Human Reconstruction Model) to infer high-fidelity avatars represented as 3D Gaussian splatting in a feed-forward pass. Our model leverages a multimodal transformer architecture to effectively encode the human body positional features and image features with attention mechanism, enabling detailed preservation of clothing geometry and texture. To further boost the face identity preservation and fine detail recovery, we propose a head feature pyramid encoding scheme to aggregate multi-scale features of the head regions. Extensive experiments demonstrate that our LHM generates plausible animatable human in seconds without post-processing for face and hands, outperforming existing methods in both reconstruction accuracy and generalization ability.

* Project Page: https://lingtengqiu.github.io/LHM/

Via

Access Paper or Ask Questions

RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

Jul 26, 2022

Jiahui Zhang, Shitao Tang, Kejie Qiu, Rui Huang, Chuan Fang, Le Cui, Zilong Dong, Siyu Zhu, Ping Tan

Figure 1 for RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

Figure 2 for RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

Figure 3 for RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

Figure 4 for RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

Abstract:Visual relocalization has been a widely discussed problem in 3D vision: given a pre-constructed 3D visual map, the 6 DoF (Degrees-of-Freedom) pose of a query image is estimated. Relocalization in large-scale indoor environments enables attractive applications such as augmented reality and robot navigation. However, appearance changes fast in such environments when the camera moves, which is challenging for the relocalization system. To address this problem, we propose a virtual view synthesis-based approach, RenderNet, to enrich the database and refine poses regarding this particular scenario. Instead of rendering real images which requires high-quality 3D models, we opt to directly render the needed global and local features of virtual viewpoints and apply them in the subsequent image retrieval and feature matching operations respectively. The proposed method can largely improve the performance in large-scale indoor environments, e.g., achieving an improvement of 7.1\% and 12.2\% on the Inloc dataset.

Via

Access Paper or Ask Questions

AR Mapping: Accurate and Efficient Mapping for Augmented Reality

Mar 27, 2021

Rui Huang, Chuan Fang, Kejie Qiu, Le Cui, Zilong Dong, Siyu Zhu, Ping Tan

Figure 1 for AR Mapping: Accurate and Efficient Mapping for Augmented Reality

Figure 2 for AR Mapping: Accurate and Efficient Mapping for Augmented Reality

Figure 3 for AR Mapping: Accurate and Efficient Mapping for Augmented Reality

Figure 4 for AR Mapping: Accurate and Efficient Mapping for Augmented Reality

Abstract:Augmented reality (AR) has gained increasingly attention from both research and industry communities. By overlaying digital information and content onto the physical world, AR enables users to experience the world in a more informative and efficient manner. As a major building block for AR systems, localization aims at determining the device's pose from a pre-built "map" consisting of visual and depth information in a known environment. While the localization problem has been widely studied in the literature, the "map" for AR systems is rarely discussed. In this paper, we introduce the AR Map for a specific scene to be composed of 1) color images with 6-DOF poses; 2) dense depth maps for each image and 3) a complete point cloud map. We then propose an efficient end-to-end solution to generating and evaluating AR Maps. Firstly, for efficient data capture, a backpack scanning device is presented with a unified calibration pipeline. Secondly, we propose an AR mapping pipeline which takes the input from the scanning device and produces accurate AR Maps. Finally, we present an approach to evaluating the accuracy of AR Maps with the help of the highly accurate reconstruction result from a high-end laser scanner. To the best of our knowledge, it is the first time to present an end-to-end solution to efficient and accurate mapping for AR applications.

* 8 pages, 14 figures

Via

Access Paper or Ask Questions

Compact 3D Map-Based Monocular Localization Using Semantic Edge Alignment

Mar 27, 2021

Kejie Qiu, Shenzhou Chen, Jiahui Zhang, Rui Huang, Le Cui, Siyu Zhu, Ping Tan

Figure 1 for Compact 3D Map-Based Monocular Localization Using Semantic Edge Alignment

Figure 2 for Compact 3D Map-Based Monocular Localization Using Semantic Edge Alignment

Figure 3 for Compact 3D Map-Based Monocular Localization Using Semantic Edge Alignment

Figure 4 for Compact 3D Map-Based Monocular Localization Using Semantic Edge Alignment

Abstract:Accurate localization is fundamental to a variety of applications, such as navigation, robotics, autonomous driving, and Augmented Reality (AR). Different from incremental localization, global localization has no drift caused by error accumulation, which is desired in many application scenarios. In addition to GPS used in the open air, 3D maps are also widely used as alternative global localization references. In this paper, we propose a compact 3D map-based global localization system using a low-cost monocular camera and an IMU (Inertial Measurement Unit). The proposed compact map consists of two types of simplified elements with multiple semantic labels, which is well adaptive to various man-made environments like urban environments. Also, semantic edge features are used for the key image-map registration, which is robust against occlusion and long-term appearance changes in the environments. To further improve the localization performance, the key semantic edge alignment is formulated as an optimization problem based on initial poses predicted by an independent VIO (Visual-Inertial Odometry) module. The localization system is realized with modular design in real time. We evaluate the localization accuracy through real-world experimental results compared with ground truth, long-term localization performance is also demonstrated.

Via

Access Paper or Ask Questions

Decentralized Visual-Inertial-UWB Fusion for Relative State Estimation of Aerial Swarm

Mar 11, 2020

Hao Xu, Luqi Wang, Yichen Zhang, Kejie Qiu, Shaojie Shen

Figure 1 for Decentralized Visual-Inertial-UWB Fusion for Relative State Estimation of Aerial Swarm

Figure 2 for Decentralized Visual-Inertial-UWB Fusion for Relative State Estimation of Aerial Swarm

Figure 3 for Decentralized Visual-Inertial-UWB Fusion for Relative State Estimation of Aerial Swarm

Figure 4 for Decentralized Visual-Inertial-UWB Fusion for Relative State Estimation of Aerial Swarm

Abstract:The collaboration of unmanned aerial vehicles (UAVs) has become a popular research topic for its practicability in multiple scenarios. The collaboration of multiple UAVs, which is also known as aerial swarm is a highly complex system, which still lacks a state-of-art decentralized relative state estimation method. In this paper, we present a novel fully decentralized visual-inertial-UWB fusion framework for relative state estimation and demonstrate the practicability by performing extensive aerial swarm flight experiments. The comparison result with ground truth data from the motion capture system shows the centimeter-level precision which outperforms all the Ultra-WideBand (UWB) and even vision based method. The system is not limited by the field of view (FoV) of the camera or Global Positioning System (GPS), meanwhile on account of its estimation consistency, we believe that the proposed relative state estimation framework has the potential to be prevalently adopted by aerial swarm applications in different scenarios in multiple scales.

* Accepted ICRA 2020

Via

Access Paper or Ask Questions

Estimating Metric Poses of Dynamic Objects Using Monocular Visual-Inertial Fusion

Aug 21, 2018

Kejie Qiu, Tong Qin, Hongwen Xie, Shaojie Shen

Figure 1 for Estimating Metric Poses of Dynamic Objects Using Monocular Visual-Inertial Fusion

Figure 2 for Estimating Metric Poses of Dynamic Objects Using Monocular Visual-Inertial Fusion

Figure 3 for Estimating Metric Poses of Dynamic Objects Using Monocular Visual-Inertial Fusion

Figure 4 for Estimating Metric Poses of Dynamic Objects Using Monocular Visual-Inertial Fusion

Abstract:A monocular 3D object tracking system generally has only up-to-scale pose estimation results without any prior knowledge of the tracked object. In this paper, we propose a novel idea to recover the metric scale of an arbitrary dynamic object by optimizing the trajectory of the objects in the world frame, without motion assumptions. By introducing an additional constraint in the time domain, our monocular visual-inertial tracking system can obtain continuous six degree of freedom (6-DoF) pose estimation without scale ambiguity. Our method requires neither fixed multi-camera nor depth sensor settings for scale observability, instead, the IMU inside the monocular sensing suite provides scale information for both camera itself and the tracked object. We build the proposed system on top of our monocular visual-inertial system (VINS) to obtain accurate state estimation of the monocular camera in the world frame. The whole system consists of a 2D object tracker, an object region-based visual bundle adjustment (BA), VINS and a correlation analysis-based metric scale estimator. Experimental comparisons with ground truth demonstrate the tracking accuracy of our 3D tracking performance while a mobile augmented reality (AR) demo shows the feasibility of potential applications.

* IROS 2018

Via

Access Paper or Ask Questions