Picture for Mike Zheng Shou

Mike Zheng Shou

DiffSim: Taming Diffusion Models for Evaluating Visual Similarity

Add code
Dec 19, 2024
Figure 1 for DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
Figure 2 for DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
Figure 3 for DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
Figure 4 for DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
Viaarxiv icon

VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video Prompting

Add code
Dec 16, 2024
Viaarxiv icon

IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation

Add code
Dec 16, 2024
Figure 1 for IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation
Figure 2 for IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation
Figure 3 for IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation
Figure 4 for IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation
Viaarxiv icon

Anti-Reference: Universal and Immediate Defense Against Reference-Based Generation

Add code
Dec 08, 2024
Viaarxiv icon

ROICtrl: Boosting Instance Control for Visual Generation

Add code
Nov 27, 2024
Figure 1 for ROICtrl: Boosting Instance Control for Visual Generation
Figure 2 for ROICtrl: Boosting Instance Control for Visual Generation
Figure 3 for ROICtrl: Boosting Instance Control for Visual Generation
Figure 4 for ROICtrl: Boosting Instance Control for Visual Generation
Viaarxiv icon

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Add code
Nov 26, 2024
Figure 1 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 2 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 3 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Figure 4 for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Viaarxiv icon

Factorized Visual Tokenization and Generation

Add code
Nov 25, 2024
Viaarxiv icon

FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data

Add code
Nov 22, 2024
Figure 1 for FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data
Figure 2 for FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data
Figure 3 for FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data
Figure 4 for FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data
Viaarxiv icon

MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation

Add code
Nov 22, 2024
Figure 1 for MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Figure 2 for MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Figure 3 for MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Figure 4 for MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Viaarxiv icon

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

Add code
Nov 15, 2024
Figure 1 for The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
Figure 2 for The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
Figure 3 for The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
Figure 4 for The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
Viaarxiv icon