Picture for Qunyi Xie

Qunyi Xie

StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

Add code
Jun 04, 2024
Viaarxiv icon

MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary

Add code
Jul 24, 2023
Viaarxiv icon

Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding

Add code
May 19, 2023
Viaarxiv icon