Picture for Longteng Guo

Longteng Guo

ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval

Add code
Oct 24, 2024
Figure 1 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Figure 2 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Figure 3 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Figure 4 for ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval
Viaarxiv icon

Ada-K Routing: Boosting the Efficiency of MoE-based LLMs

Add code
Oct 15, 2024
Viaarxiv icon

MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation

Add code
Oct 02, 2024
Viaarxiv icon

OneDiff: A Generalist Model for Image Difference

Add code
Jul 08, 2024
Viaarxiv icon

Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

Add code
Jun 13, 2024
Figure 1 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Figure 2 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Figure 3 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Figure 4 for Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs
Viaarxiv icon

Boter: Bootstrapping Knowledge Selection and Question Answering for Knowledge-based VQA

Add code
Apr 22, 2024
Figure 1 for Boter: Bootstrapping Knowledge Selection and Question Answering for Knowledge-based VQA
Figure 2 for Boter: Bootstrapping Knowledge Selection and Question Answering for Knowledge-based VQA
Figure 3 for Boter: Bootstrapping Knowledge Selection and Question Answering for Knowledge-based VQA
Figure 4 for Boter: Bootstrapping Knowledge Selection and Question Answering for Knowledge-based VQA
Viaarxiv icon

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

Add code
Mar 20, 2024
Figure 1 for SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Figure 2 for SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Figure 3 for SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Figure 4 for SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Viaarxiv icon

VL-Mamba: Exploring State Space Models for Multimodal Learning

Add code
Mar 20, 2024
Figure 1 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Figure 2 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Figure 3 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Figure 4 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Viaarxiv icon

Knowledge Condensation and Reasoning for Knowledge-based VQA

Add code
Mar 15, 2024
Viaarxiv icon

Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation

Add code
Dec 13, 2023
Viaarxiv icon