Picture for Xuehai He

Xuehai He

Mojito: Motion Trajectory and Intensity Control for Video Generation

Add code
Dec 12, 2024
Viaarxiv icon

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

Add code
Oct 03, 2024
Figure 1 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Figure 2 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Figure 3 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Figure 4 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Viaarxiv icon

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Add code
Jun 12, 2024
Viaarxiv icon

Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA

Add code
May 30, 2024
Viaarxiv icon

FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

Add code
May 08, 2024
Viaarxiv icon

Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning

Add code
Oct 14, 2023
Viaarxiv icon

MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

Add code
Oct 05, 2023
Viaarxiv icon

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

Add code
May 24, 2023
Viaarxiv icon

Discriminative Diffusion Models as Few-shot Vision and Language Learners

Add code
May 18, 2023
Figure 1 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 2 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 3 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Figure 4 for Discriminative Diffusion Models as Few-shot Vision and Language Learners
Viaarxiv icon

Multimodal Graph Transformer for Multimodal Question Answering

Add code
Apr 30, 2023
Viaarxiv icon