Picture for Gang Xiong

Gang Xiong

Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval

Add code
Oct 22, 2024
Figure 1 for Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval
Figure 2 for Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval
Figure 3 for Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval
Figure 4 for Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval
Viaarxiv icon

Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining

Add code
Oct 01, 2024
Figure 1 for Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Figure 2 for Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Figure 3 for Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Figure 4 for Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Viaarxiv icon

T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval

Add code
Aug 21, 2024
Figure 1 for T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Figure 2 for T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Figure 3 for T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Figure 4 for T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Viaarxiv icon

IIU: Independent Inference Units for Knowledge-based Visual Question Answering

Add code
Aug 15, 2024
Viaarxiv icon

Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning

Add code
Jul 23, 2024
Viaarxiv icon

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

Add code
Mar 20, 2024
Figure 1 for SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Figure 2 for SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Figure 3 for SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Figure 4 for SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Viaarxiv icon

RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences

Add code
Mar 12, 2024
Viaarxiv icon

Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding as a Service

Add code
Nov 10, 2023
Viaarxiv icon

Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval

Add code
Sep 28, 2023
Viaarxiv icon

Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search

Add code
Sep 28, 2023
Viaarxiv icon