Picture for Mike Zheng Shou

Mike Zheng Shou

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

Add code
Nov 07, 2024
Viaarxiv icon

Skinned Motion Retargeting with Dense Geometric Interaction Perception

Add code
Oct 28, 2024
Viaarxiv icon

ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model

Add code
Oct 12, 2024
Figure 1 for ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model
Figure 2 for ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model
Figure 3 for ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model
Figure 4 for ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model
Viaarxiv icon

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models

Add code
Oct 10, 2024
Figure 1 for EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Figure 2 for EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Figure 3 for EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Figure 4 for EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Viaarxiv icon

Image Watermarks are Removable Using Controllable Regeneration from Clean Noise

Add code
Oct 07, 2024
Viaarxiv icon

Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos

Add code
Oct 04, 2024
Viaarxiv icon

High Quality Human Image Animation using Regional Supervision and Motion Blur Condition

Add code
Sep 29, 2024
Figure 1 for High Quality Human Image Animation using Regional Supervision and Motion Blur Condition
Figure 2 for High Quality Human Image Animation using Regional Supervision and Motion Blur Condition
Figure 3 for High Quality Human Image Animation using Regional Supervision and Motion Blur Condition
Figure 4 for High Quality Human Image Animation using Regional Supervision and Motion Blur Condition
Viaarxiv icon

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Add code
Sep 29, 2024
Viaarxiv icon

DOTA: Distributional Test-Time Adaptation of Vision-Language Models

Add code
Sep 28, 2024
Viaarxiv icon

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation

Add code
Aug 29, 2024
Figure 1 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Figure 2 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Figure 3 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Figure 4 for VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Viaarxiv icon