Picture for Jianbing Zhang

Jianbing Zhang

Vision-Language Models Can Self-Improve Reasoning via Reflection

Add code
Oct 30, 2024
Figure 1 for Vision-Language Models Can Self-Improve Reasoning via Reflection
Figure 2 for Vision-Language Models Can Self-Improve Reasoning via Reflection
Figure 3 for Vision-Language Models Can Self-Improve Reasoning via Reflection
Figure 4 for Vision-Language Models Can Self-Improve Reasoning via Reflection
Viaarxiv icon

The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning

Add code
Apr 19, 2024
Viaarxiv icon

MixRED: A Mix-lingual Relation Extraction Dataset

Add code
Mar 23, 2024
Figure 1 for MixRED: A Mix-lingual Relation Extraction Dataset
Figure 2 for MixRED: A Mix-lingual Relation Extraction Dataset
Figure 3 for MixRED: A Mix-lingual Relation Extraction Dataset
Figure 4 for MixRED: A Mix-lingual Relation Extraction Dataset
Viaarxiv icon

Cobra Effect in Reference-Free Image Captioning Metrics

Add code
Feb 18, 2024
Figure 1 for Cobra Effect in Reference-Free Image Captioning Metrics
Figure 2 for Cobra Effect in Reference-Free Image Captioning Metrics
Figure 3 for Cobra Effect in Reference-Free Image Captioning Metrics
Figure 4 for Cobra Effect in Reference-Free Image Captioning Metrics
Viaarxiv icon

EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models

Add code
Feb 15, 2024
Viaarxiv icon

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

Add code
Jan 17, 2024
Viaarxiv icon

M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal Aspect-based Sentiment Analysis

Add code
Oct 23, 2023
Viaarxiv icon

Bounding and Filling: A Fast and Flexible Framework for Image Captioning

Add code
Oct 15, 2023
Viaarxiv icon

DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking

Add code
Oct 09, 2023
Figure 1 for DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking
Figure 2 for DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking
Figure 3 for DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking
Figure 4 for DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking
Viaarxiv icon

Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models

Add code
Aug 06, 2023
Viaarxiv icon