Picture for Xiangyu Yue

Xiangyu Yue

Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines

Add code
Oct 28, 2024
Figure 1 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Figure 2 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Figure 3 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Figure 4 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Viaarxiv icon

BIFRÖST: 3D-Aware Image compositing with Language Instructions

Add code
Oct 24, 2024
Viaarxiv icon

Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant

Add code
Oct 17, 2024
Viaarxiv icon

Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling

Add code
Oct 14, 2024
Viaarxiv icon

Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

Add code
Oct 10, 2024
Viaarxiv icon

Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

Add code
Oct 02, 2024
Viaarxiv icon

Training Matting Models without Alpha Labels

Add code
Aug 20, 2024
Viaarxiv icon

Explore the Limits of Omni-modal Pretraining at Scale

Add code
Jun 13, 2024
Viaarxiv icon

SSNVC: Single Stream Neural Video Compression with Implicit Temporal Information

Add code
Jun 11, 2024
Viaarxiv icon

EMR-Merging: Tuning-Free High-Performance Model Merging

Add code
May 23, 2024
Viaarxiv icon