Picture for Xiaodan Liang

Xiaodan Liang

Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism

Add code
Dec 13, 2024
Viaarxiv icon

RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation

Add code
Dec 11, 2024
Viaarxiv icon

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving

Add code
Dec 10, 2024
Viaarxiv icon

InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction

Add code
Dec 08, 2024
Viaarxiv icon

EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation

Add code
Dec 06, 2024
Viaarxiv icon

PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos

Add code
Dec 02, 2024
Viaarxiv icon

AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning

Add code
Nov 18, 2024
Viaarxiv icon

InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models

Add code
Nov 18, 2024
Figure 1 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Figure 2 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Figure 3 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Figure 4 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Viaarxiv icon

VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation

Add code
Nov 14, 2024
Viaarxiv icon

StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration

Add code
Nov 07, 2024
Figure 1 for StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
Figure 2 for StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
Figure 3 for StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
Figure 4 for StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
Viaarxiv icon