Picture for Xiaodan Liang

Xiaodan Liang

ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions

Add code
Jan 21, 2025
Viaarxiv icon

CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation

Add code
Jan 20, 2025
Viaarxiv icon

DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder

Add code
Dec 23, 2024
Figure 1 for DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder
Figure 2 for DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder
Figure 3 for DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder
Figure 4 for DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder
Viaarxiv icon

Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism

Add code
Dec 13, 2024
Figure 1 for Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism
Figure 2 for Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism
Figure 3 for Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism
Figure 4 for Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism
Viaarxiv icon

RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation

Add code
Dec 11, 2024
Viaarxiv icon

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving

Add code
Dec 10, 2024
Figure 1 for DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Figure 2 for DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Figure 3 for DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Figure 4 for DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Viaarxiv icon

InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction

Add code
Dec 08, 2024
Viaarxiv icon

EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation

Add code
Dec 06, 2024
Figure 1 for EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation
Figure 2 for EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation
Figure 3 for EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation
Figure 4 for EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation
Viaarxiv icon

PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos

Add code
Dec 02, 2024
Figure 1 for PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Figure 2 for PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Figure 3 for PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Figure 4 for PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Viaarxiv icon

InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models

Add code
Nov 18, 2024
Figure 1 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Figure 2 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Figure 3 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Figure 4 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Viaarxiv icon