Picture for Jiannan Wu

Jiannan Wu

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

Add code
Jun 12, 2024
Viaarxiv icon

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Add code
Apr 19, 2024
Viaarxiv icon

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Add code
Jan 15, 2024
Viaarxiv icon

UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

Add code
Dec 25, 2023
Viaarxiv icon

Exploring Transformers for Open-world Instance Segmentation

Add code
Aug 08, 2023
Viaarxiv icon

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Add code
May 25, 2023
Viaarxiv icon

Multi-Level Contrastive Learning for Dense Prediction Task

Add code
Apr 04, 2023
Viaarxiv icon

Universal Instance Perception as Object Discovery and Retrieval

Add code
Mar 12, 2023
Viaarxiv icon

Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders

Add code
Oct 09, 2022
Figure 1 for Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Figure 2 for Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Figure 3 for Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Figure 4 for Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Viaarxiv icon

Language as Queries for Referring Video Object Segmentation

Add code
Jan 03, 2022
Figure 1 for Language as Queries for Referring Video Object Segmentation
Figure 2 for Language as Queries for Referring Video Object Segmentation
Figure 3 for Language as Queries for Referring Video Object Segmentation
Figure 4 for Language as Queries for Referring Video Object Segmentation
Viaarxiv icon