Picture for Mike Zheng Shou

Mike Zheng Shou

Anti-Reference: Universal and Immediate Defense Against Reference-Based Generation

Add code
Dec 08, 2024
Viaarxiv icon

ROICtrl: Boosting Instance Control for Visual Generation

Add code
Nov 27, 2024
Figure 1 for ROICtrl: Boosting Instance Control for Visual Generation
Figure 2 for ROICtrl: Boosting Instance Control for Visual Generation
Figure 3 for ROICtrl: Boosting Instance Control for Visual Generation
Figure 4 for ROICtrl: Boosting Instance Control for Visual Generation
Viaarxiv icon

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Add code
Nov 26, 2024
Figure 1 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 2 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 3 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 4 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Viaarxiv icon

Factorized Visual Tokenization and Generation

Add code
Nov 25, 2024
Viaarxiv icon

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation

Add code
Nov 22, 2024
Viaarxiv icon

FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data

Add code
Nov 22, 2024
Viaarxiv icon

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

Add code
Nov 15, 2024
Figure 1 for The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
Figure 2 for The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
Figure 3 for The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
Figure 4 for The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
Viaarxiv icon

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

Add code
Nov 07, 2024
Figure 1 for ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Figure 2 for ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Figure 3 for ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Figure 4 for ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Viaarxiv icon

Skinned Motion Retargeting with Dense Geometric Interaction Perception

Add code
Oct 28, 2024
Viaarxiv icon

ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model

Add code
Oct 12, 2024
Figure 1 for ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model
Figure 2 for ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model
Figure 3 for ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model
Figure 4 for ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model
Viaarxiv icon