Picture for Xiaohan Ding

Xiaohan Ding

Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines

Add code
Oct 28, 2024
Figure 1 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Figure 2 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Figure 3 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Figure 4 for Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
Viaarxiv icon

Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations

Add code
Oct 10, 2024
Viaarxiv icon

CounterQuill: Investigating the Potential of Human-AI Collaboration in Online Counterspeech Writing

Add code
Oct 03, 2024
Viaarxiv icon

Quantized Prompt for Efficient Generalization of Vision-Language Models

Add code
Jul 15, 2024
Viaarxiv icon

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

Add code
Apr 22, 2024
Viaarxiv icon

Leveraging Prompt-Based Large Language Models: Predicting Pandemic Health Decisions and Outcomes Through Social Media Language

Add code
Mar 01, 2024
Viaarxiv icon

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

Add code
Feb 05, 2024
Viaarxiv icon

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Add code
Jan 25, 2024
Viaarxiv icon

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

Add code
Dec 14, 2023
Viaarxiv icon

Online Vectorized HD Map Construction using Geometry

Add code
Dec 06, 2023
Viaarxiv icon