Picture for Mingfei Gao

Mingfei Gao

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Add code
Sep 30, 2024
Viaarxiv icon

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Add code
Jul 22, 2024
Viaarxiv icon

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Add code
Jun 14, 2024
Viaarxiv icon

4M: Massively Multimodal Masked Modeling

Add code
Dec 11, 2023
Viaarxiv icon

Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

Add code
Mar 29, 2023
Viaarxiv icon

ULIP: Learning Unified Representation of Language, Image and Point Cloud for 3D Understanding

Add code
Dec 10, 2022
Viaarxiv icon

TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation

Add code
Aug 14, 2022
Figure 1 for TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
Figure 2 for TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
Figure 3 for TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
Figure 4 for TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
Viaarxiv icon

Value Retrieval with Arbitrary Queries for Form-like Documents

Add code
Dec 15, 2021
Figure 1 for Value Retrieval with Arbitrary Queries for Form-like Documents
Figure 2 for Value Retrieval with Arbitrary Queries for Form-like Documents
Figure 3 for Value Retrieval with Arbitrary Queries for Form-like Documents
Figure 4 for Value Retrieval with Arbitrary Queries for Form-like Documents
Viaarxiv icon

Burn After Reading: Online Adaptation for Cross-domain Streaming Data

Add code
Dec 08, 2021
Figure 1 for Burn After Reading: Online Adaptation for Cross-domain Streaming Data
Figure 2 for Burn After Reading: Online Adaptation for Cross-domain Streaming Data
Figure 3 for Burn After Reading: Online Adaptation for Cross-domain Streaming Data
Figure 4 for Burn After Reading: Online Adaptation for Cross-domain Streaming Data
Viaarxiv icon

Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes

Add code
Nov 18, 2021
Figure 1 for Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes
Figure 2 for Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes
Figure 3 for Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes
Figure 4 for Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes
Viaarxiv icon