Picture for Wenwen Yu

Wenwen Yu

OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models

Add code
Feb 22, 2025
Viaarxiv icon

ClickTrack: Towards Real-time Interactive Single Object Tracking

Add code
Nov 24, 2024
Figure 1 for ClickTrack: Towards Real-time Interactive Single Object Tracking
Figure 2 for ClickTrack: Towards Real-time Interactive Single Object Tracking
Figure 3 for ClickTrack: Towards Real-time Interactive Single Object Tracking
Figure 4 for ClickTrack: Towards Real-time Interactive Single Object Tracking
Viaarxiv icon

Click; Single Object Tracking; Video Object Segmentation; Real-time Interaction

Add code
Nov 20, 2024
Figure 1 for Click; Single Object Tracking; Video Object Segmentation; Real-time Interaction
Figure 2 for Click; Single Object Tracking; Video Object Segmentation; Real-time Interaction
Figure 3 for Click; Single Object Tracking; Video Object Segmentation; Real-time Interaction
Figure 4 for Click; Single Object Tracking; Video Object Segmentation; Real-time Interaction
Viaarxiv icon

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

Add code
Mar 28, 2024
Viaarxiv icon

P2Seg: Pointly-supervised Segmentation via Mutual Distillation

Add code
Jan 18, 2024
Viaarxiv icon

P2RBox: A Single Point is All You Need for Oriented Object Detection

Add code
Nov 22, 2023
Figure 1 for P2RBox: A Single Point is All You Need for Oriented Object Detection
Figure 2 for P2RBox: A Single Point is All You Need for Oriented Object Detection
Figure 3 for P2RBox: A Single Point is All You Need for Oriented Object Detection
Figure 4 for P2RBox: A Single Point is All You Need for Oriented Object Detection
Viaarxiv icon

Turning a CLIP Model into a Scene Text Spotter

Add code
Aug 21, 2023
Figure 1 for Turning a CLIP Model into a Scene Text Spotter
Figure 2 for Turning a CLIP Model into a Scene Text Spotter
Figure 3 for Turning a CLIP Model into a Scene Text Spotter
Figure 4 for Turning a CLIP Model into a Scene Text Spotter
Viaarxiv icon

Looking and Listening: Audio Guided Text Recognition

Add code
Jun 06, 2023
Figure 1 for Looking and Listening: Audio Guided Text Recognition
Figure 2 for Looking and Listening: Audio Guided Text Recognition
Figure 3 for Looking and Listening: Audio Guided Text Recognition
Figure 4 for Looking and Listening: Audio Guided Text Recognition
Viaarxiv icon

ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

Add code
Jun 05, 2023
Viaarxiv icon

On the Hidden Mystery of OCR in Large Multimodal Models

Add code
May 13, 2023
Viaarxiv icon