Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Joe Lin

Joint Optimization for 4D Human-Scene Reconstruction in the Wild

Jan 04, 2025

Zhizheng Liu, Joe Lin, Wayne Wu, Bolei Zhou

Figure 1 for Joint Optimization for 4D Human-Scene Reconstruction in the Wild

Figure 2 for Joint Optimization for 4D Human-Scene Reconstruction in the Wild

Figure 3 for Joint Optimization for 4D Human-Scene Reconstruction in the Wild

Figure 4 for Joint Optimization for 4D Human-Scene Reconstruction in the Wild

Abstract:Reconstructing human motion and its surrounding environment is crucial for understanding human-scene interaction and predicting human movements in the scene. While much progress has been made in capturing human-scene interaction in constrained environments, those prior methods can hardly reconstruct the natural and diverse human motion and scene context from web videos. In this work, we propose JOSH, a novel optimization-based method for 4D human-scene reconstruction in the wild from monocular videos. JOSH uses techniques in both dense scene reconstruction and human mesh recovery as initialization, and then it leverages the human-scene contact constraints to jointly optimize the scene, the camera poses, and the human motion. Experiment results show JOSH achieves better results on both global human motion estimation and dense scene reconstruction by joint optimization of scene geometry and human motion. We further design a more efficient model, JOSH3R, and directly train it with pseudo-labels from web videos. JOSH3R outperforms other optimization-free methods by only training with labels predicted from JOSH, further demonstrating its accuracy and generalization ability.

* Project Page: https://genforce.github.io/JOSH/

Via

Access Paper or Ask Questions

Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels

Oct 10, 2024

Zhizheng Liu, Joe Lin, Wayne Wu, Bolei Zhou

Figure 1 for Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels

Figure 2 for Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels

Figure 3 for Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels

Figure 4 for Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels

Abstract:Understanding and modeling pedestrian movements in the real world is crucial for applications like motion forecasting and scene simulation. Many factors influence pedestrian movements, such as scene context, individual characteristics, and goals, which are often ignored by the existing human generation methods. Web videos contain natural pedestrian behavior and rich motion context, but annotating them with pre-trained predictors leads to noisy labels. In this work, we propose learning diverse pedestrian movements from web videos. We first curate a large-scale dataset called CityWalkers that captures diverse real-world pedestrian movements in urban scenes. Then, based on CityWalkers, we propose a generative model called PedGen for diverse pedestrian movement generation. PedGen introduces automatic label filtering to remove the low-quality labels and a mask embedding to train with partial labels. It also contains a novel context encoder that lifts the 2D scene context to 3D and can incorporate various context factors in generating realistic pedestrian movements in urban scenes. Experiments show that PedGen outperforms existing baseline methods for pedestrian movement generation by learning from noisy labels and incorporating the context factors. In addition, PedGen achieves zero-shot generalization in both real-world and simulated environments. The code, model, and data will be made publicly available at https://genforce.github.io/PedGen/ .

* Project Page: https://genforce.github.io/PedGen/

Via

Access Paper or Ask Questions