Picture for Alex Jinpeng Wang

Alex Jinpeng Wang

Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning

Add code
Jun 04, 2024
Viaarxiv icon

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Add code
Jan 01, 2024
Viaarxiv icon

Parrot Captions Teach CLIP to Spot Text

Add code
Dec 28, 2023
Viaarxiv icon

UniVTG: Towards Unified Video-Language Temporal Grounding

Add code
Aug 18, 2023
Viaarxiv icon

Too Large; Data Reduction for Vision-Language Pre-Training

Add code
Jun 01, 2023
Viaarxiv icon

Position-guided Text Prompt for Vision-Language Pre-training

Add code
Dec 19, 2022
Viaarxiv icon

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022

Add code
Jul 04, 2022
Figure 1 for Egocentric Video-Language Pretraining @ Ego4D Challenge 2022
Figure 2 for Egocentric Video-Language Pretraining @ Ego4D Challenge 2022
Figure 3 for Egocentric Video-Language Pretraining @ Ego4D Challenge 2022
Figure 4 for Egocentric Video-Language Pretraining @ Ego4D Challenge 2022
Viaarxiv icon

Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022

Add code
Jul 04, 2022
Figure 1 for Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022
Figure 2 for Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022
Figure 3 for Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022
Figure 4 for Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022
Viaarxiv icon

Egocentric Video-Language Pretraining

Add code
Jun 03, 2022
Figure 1 for Egocentric Video-Language Pretraining
Figure 2 for Egocentric Video-Language Pretraining
Figure 3 for Egocentric Video-Language Pretraining
Figure 4 for Egocentric Video-Language Pretraining
Viaarxiv icon

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval

Add code
Apr 26, 2022
Figure 1 for MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
Figure 2 for MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
Figure 3 for MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
Figure 4 for MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
Viaarxiv icon