Picture for Xiaowei Chi

Xiaowei Chi

EVA: An Embodied World Model for Future Video Anticipation

Add code
Oct 20, 2024
Figure 1 for EVA: An Embodied World Model for Future Video Anticipation
Figure 2 for EVA: An Embodied World Model for Future Video Anticipation
Figure 3 for EVA: An Embodied World Model for Future Video Anticipation
Figure 4 for EVA: An Embodied World Model for Future Video Anticipation
Viaarxiv icon

PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion

Add code
Sep 16, 2024
Viaarxiv icon

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

Add code
Jul 30, 2024
Figure 1 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 2 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 3 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 4 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Viaarxiv icon

M-LRM: Multi-view Large Reconstruction Model

Add code
Jun 11, 2024
Viaarxiv icon

LLMs Meet Multimodal Generation and Editing: A Survey

Add code
May 29, 2024
Viaarxiv icon

CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild

Add code
May 27, 2024
Viaarxiv icon

DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments

Add code
Feb 29, 2024
Viaarxiv icon

ChatMusician: Understanding and Generating Music Intrinsically with LLM

Add code
Feb 25, 2024
Figure 1 for ChatMusician: Understanding and Generating Music Intrinsically with LLM
Figure 2 for ChatMusician: Understanding and Generating Music Intrinsically with LLM
Figure 3 for ChatMusician: Understanding and Generating Music Intrinsically with LLM
Figure 4 for ChatMusician: Understanding and Generating Music Intrinsically with LLM
Viaarxiv icon

ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model

Add code
Nov 29, 2023
Figure 1 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Figure 2 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Figure 3 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Figure 4 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Viaarxiv icon

Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation

Add code
Nov 29, 2023
Figure 1 for Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Figure 2 for Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Figure 3 for Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Figure 4 for Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Viaarxiv icon