Picture for Yiyuan Zhang

Yiyuan Zhang

DebiasDiff: Debiasing Text-to-image Diffusion Models with Self-discovering Latent Attribute Directions

Add code
Dec 25, 2024
Viaarxiv icon

Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines

Add code
Oct 28, 2024
Figure 1 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Figure 2 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Figure 3 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Figure 4 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Viaarxiv icon

Octopus-Swimming-Like Robot with Soft Asymmetric Arms

Add code
Oct 15, 2024
Figure 1 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms
Figure 2 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms
Figure 3 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms
Figure 4 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms
Viaarxiv icon

Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

Add code
Oct 10, 2024
Figure 1 for Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations
Figure 2 for Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations
Figure 3 for Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations
Figure 4 for Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations
Viaarxiv icon

Explore the Limits of Omni-modal Pretraining at Scale

Add code
Jun 13, 2024
Viaarxiv icon

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

Add code
Feb 05, 2024
Viaarxiv icon

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Add code
Jan 25, 2024
Viaarxiv icon

Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Add code
Dec 07, 2023
Figure 1 for Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Figure 2 for Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Figure 3 for Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Figure 4 for Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Viaarxiv icon

OneLLM: One Framework to Align All Modalities with Language

Add code
Dec 06, 2023
Viaarxiv icon

Online Vectorized HD Map Construction using Geometry

Add code
Dec 06, 2023
Viaarxiv icon