Picture for Chen Duan

Chen Duan

Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding

Add code
Mar 18, 2025
Viaarxiv icon

A Token-level Text Image Foundation Model for Document Understanding

Add code
Mar 04, 2025
Viaarxiv icon

Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review

Add code
Feb 23, 2025
Viaarxiv icon

InstructOCR: Instruction Boosting Scene Text Spotting

Add code
Dec 20, 2024
Viaarxiv icon

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

Add code
Mar 01, 2024
Figure 1 for ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Figure 2 for ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Figure 3 for ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Figure 4 for ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Viaarxiv icon