Picture for Zejun Li

Zejun Li

EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models

Add code
Jun 09, 2024
Viaarxiv icon

VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models

Add code
May 28, 2024
Viaarxiv icon

DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning

Add code
Apr 02, 2024
Viaarxiv icon

ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks

Add code
Oct 17, 2023
Viaarxiv icon

A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training

Add code
Jun 11, 2022
Figure 1 for A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training
Figure 2 for A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training
Figure 3 for A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training
Figure 4 for A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training
Viaarxiv icon

MVP: Multi-Stage Vision-Language Pre-Training via Multi-Level Semantic Alignment

Add code
Jan 29, 2022
Figure 1 for MVP: Multi-Stage Vision-Language Pre-Training via Multi-Level Semantic Alignment
Figure 2 for MVP: Multi-Stage Vision-Language Pre-Training via Multi-Level Semantic Alignment
Figure 3 for MVP: Multi-Stage Vision-Language Pre-Training via Multi-Level Semantic Alignment
Figure 4 for MVP: Multi-Stage Vision-Language Pre-Training via Multi-Level Semantic Alignment
Viaarxiv icon

Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval

Add code
Nov 05, 2021
Figure 1 for Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval
Figure 2 for Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval
Figure 3 for Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval
Figure 4 for Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences for Image-Text Retrieval
Viaarxiv icon

Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval

Add code
Sep 12, 2021
Figure 1 for Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval
Figure 2 for Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval
Figure 3 for Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval
Figure 4 for Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval
Viaarxiv icon

TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning

Add code
Jun 21, 2021
Figure 1 for TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning
Figure 2 for TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning
Figure 3 for TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning
Figure 4 for TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning
Viaarxiv icon

An Unsupervised Sampling Approach for Image-Sentence Matching Using Document-Level Structural Information

Add code
Mar 21, 2021
Figure 1 for An Unsupervised Sampling Approach for Image-Sentence Matching Using Document-Level Structural Information
Figure 2 for An Unsupervised Sampling Approach for Image-Sentence Matching Using Document-Level Structural Information
Figure 3 for An Unsupervised Sampling Approach for Image-Sentence Matching Using Document-Level Structural Information
Figure 4 for An Unsupervised Sampling Approach for Image-Sentence Matching Using Document-Level Structural Information
Viaarxiv icon