Picture for Huaijin Pi

Huaijin Pi

MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space

Add code
Mar 19, 2025
Viaarxiv icon

Mocap-2-to-3: Lifting 2D Diffusion-Based Pretrained Models for 3D Motion Capture

Add code
Mar 05, 2025
Viaarxiv icon

Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation

Add code
Feb 27, 2025
Viaarxiv icon

Motion-2-to-3: Leveraging 2D Motion Data to Boost 3D Motion Generation

Add code
Dec 17, 2024
Viaarxiv icon

World-Grounded Human Motion Recovery via Gravity-View Coordinates

Add code
Sep 10, 2024
Figure 1 for World-Grounded Human Motion Recovery via Gravity-View Coordinates
Figure 2 for World-Grounded Human Motion Recovery via Gravity-View Coordinates
Figure 3 for World-Grounded Human Motion Recovery via Gravity-View Coordinates
Figure 4 for World-Grounded Human Motion Recovery via Gravity-View Coordinates
Viaarxiv icon

Generating Human Motion in 3D Scenes from Text Descriptions

Add code
May 13, 2024
Figure 1 for Generating Human Motion in 3D Scenes from Text Descriptions
Figure 2 for Generating Human Motion in 3D Scenes from Text Descriptions
Figure 3 for Generating Human Motion in 3D Scenes from Text Descriptions
Figure 4 for Generating Human Motion in 3D Scenes from Text Descriptions
Viaarxiv icon

Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models

Add code
Oct 03, 2023
Figure 1 for Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models
Figure 2 for Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models
Figure 3 for Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models
Figure 4 for Hierarchical Generation of Human-Object Interactions with Diffusion Probabilistic Models
Viaarxiv icon

A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter

Add code
Feb 24, 2023
Figure 1 for A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter
Figure 2 for A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter
Figure 3 for A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter
Figure 4 for A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter
Viaarxiv icon

E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context

Add code
Jul 17, 2022
Figure 1 for E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context
Figure 2 for E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context
Figure 3 for E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context
Figure 4 for E-NeRV: Expedite Neural Video Representation with Disentangled Spatial-Temporal Context
Viaarxiv icon

Searching for TrioNet: Combining Convolution with Local and Global Self-Attention

Add code
Nov 15, 2021
Figure 1 for Searching for TrioNet: Combining Convolution with Local and Global Self-Attention
Figure 2 for Searching for TrioNet: Combining Convolution with Local and Global Self-Attention
Figure 3 for Searching for TrioNet: Combining Convolution with Local and Global Self-Attention
Figure 4 for Searching for TrioNet: Combining Convolution with Local and Global Self-Attention
Viaarxiv icon