Picture for Yiyuan Zhang

Yiyuan Zhang

Multimodal Long Video Modeling Based on Temporal Dynamic Context

Add code
Apr 14, 2025
Viaarxiv icon

Grasping by Spiraling: Reproducing Elephant Movements with Rigid-Soft Robot Synergy

Add code
Apr 02, 2025
Viaarxiv icon

DebiasDiff: Debiasing Text-to-image Diffusion Models with Self-discovering Latent Attribute Directions

Add code
Dec 25, 2024
Viaarxiv icon

Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines

Add code
Oct 28, 2024
Figure 1 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Figure 2 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Figure 3 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Figure 4 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Viaarxiv icon

Octopus-Swimming-Like Robot with Soft Asymmetric Arms

Add code
Oct 15, 2024
Figure 1 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms
Figure 2 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms
Figure 3 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms
Figure 4 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms
Viaarxiv icon

Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

Add code
Oct 10, 2024
Figure 1 for Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations
Figure 2 for Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations
Figure 3 for Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations
Figure 4 for Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations
Viaarxiv icon

Explore the Limits of Omni-modal Pretraining at Scale

Add code
Jun 13, 2024
Viaarxiv icon

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

Add code
Feb 05, 2024
Viaarxiv icon

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Add code
Jan 25, 2024
Viaarxiv icon

Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Add code
Dec 07, 2023
Figure 1 for Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Figure 2 for Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Figure 3 for Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Figure 4 for Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Viaarxiv icon