Picture for Qingyun Li

Qingyun Li

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Add code
Feb 11, 2025
Viaarxiv icon

PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection

Add code
Jan 23, 2025
Figure 1 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Figure 2 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Figure 3 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Figure 4 for PointOBB-v3: Expanding Performance Boundaries of Single Point-Supervised Oriented Object Detection
Viaarxiv icon

A Simple Aerial Detection Baseline of Multimodal Language Models

Add code
Jan 16, 2025
Figure 1 for A Simple Aerial Detection Baseline of Multimodal Language Models
Figure 2 for A Simple Aerial Detection Baseline of Multimodal Language Models
Figure 3 for A Simple Aerial Detection Baseline of Multimodal Language Models
Figure 4 for A Simple Aerial Detection Baseline of Multimodal Language Models
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Figure 1 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 2 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 3 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 4 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Viaarxiv icon

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 13, 2024
Figure 1 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Add code
Jun 12, 2024
Figure 1 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 2 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 3 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Figure 4 for OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Viaarxiv icon

FLoRA: Low-Rank Core Space for N-dimension

Add code
May 23, 2024
Viaarxiv icon

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Add code
Feb 29, 2024
Figure 1 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Figure 2 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Figure 3 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Figure 4 for The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Viaarxiv icon

Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision

Add code
Nov 23, 2023
Viaarxiv icon

PointOBB: Learning Oriented Object Detection via Single Point Supervision

Add code
Nov 23, 2023
Figure 1 for PointOBB: Learning Oriented Object Detection via Single Point Supervision
Figure 2 for PointOBB: Learning Oriented Object Detection via Single Point Supervision
Figure 3 for PointOBB: Learning Oriented Object Detection via Single Point Supervision
Figure 4 for PointOBB: Learning Oriented Object Detection via Single Point Supervision
Viaarxiv icon