Picture for Yiyuan Zhang

Yiyuan Zhang

Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines

Add code
Oct 28, 2024
Figure 1 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Figure 2 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Figure 3 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Figure 4 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Viaarxiv icon

Octopus-Swimming-Like Robot with Soft Asymmetric Arms

Add code
Oct 15, 2024
Figure 1 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms
Figure 2 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms
Figure 3 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms
Figure 4 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms
Viaarxiv icon

Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

Add code
Oct 10, 2024
Viaarxiv icon

Explore the Limits of Omni-modal Pretraining at Scale

Add code
Jun 13, 2024
Viaarxiv icon

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

Add code
Feb 05, 2024
Viaarxiv icon

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Add code
Jan 25, 2024
Viaarxiv icon

Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Add code
Dec 07, 2023
Figure 1 for Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Figure 2 for Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Figure 3 for Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Figure 4 for Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Viaarxiv icon

OneLLM: One Framework to Align All Modalities with Language

Add code
Dec 06, 2023
Viaarxiv icon

Online Vectorized HD Map Construction using Geometry

Add code
Dec 06, 2023
Viaarxiv icon

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Add code
Nov 27, 2023
Figure 1 for UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Figure 2 for UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Figure 3 for UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Figure 4 for UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Viaarxiv icon